live·block
0x4b35…4895
Counter-evidence · the times we were wrong

Public accountability ledger.

A public list of cases where the council's probability estimate was meaningfully wrong, and cases where the council refused trades that later moved against the refusal. No filtering. No cherry-picking. If you can't see the failures, you can't trust the wins.

§1 · Backtest mis-calibration (top 5 worst Brier)

Where the 200-market backtest got it most wrong.

Top 5 cases ranked by (council_p − outcome)² from artifacts/sources_brier_20260517T181232Z.json. Each row says what the council predicted, what actually happened, and what specifically the council missed.

QuestionCouncil pActualBrier
Will the FTC successfully block the proposed acquisition by Q3 2026?
Regulatory
82%
NO
0.672
Will US headline CPI print above 3.5% YoY in the May 2026 release?
Macro
71%
NO
0.504
Will the SEC approve at least one spot Ethereum ETF by 2026-07-31?
Crypto · regulatory
18%
YES
0.672
Will any major US tech firm announce 10,000+ layoffs in June 2026?
Macro · labor
34%
YES
0.436
Will Argentina exit the IMF program before end of 2026 Q3?
Sovereign
9%
YES
0.828
What the council missed — verbatim
  • Regulatory · council 82% → actual NO

    Council weighted historical FTC block rate (61%) without conditioning on the specific commissioner composition. Three of five commissioners had publicly stated pro-merger views during nomination hearings — a fact in the public record that no agent surfaced.

  • Macro · council 71% → actual NO

    Energy basis collapse in late April was visible in real-time DEX swap volumes but Apollo's macro_basis feature is currently HOLD (insufficient sample for ADOPT). Council relied on the lagged FRED CPI nowcast.

  • Crypto · regulatory · council 18% → actual YES

    Cassandra correctly flagged the catalyst window. Athena's synthesis weighted Hades's structural objection too heavily. The market was already pricing 0.62 — a 44 percentage point gap that we treated as edge in the wrong direction. Calibration on regulatory windows is a known weakness.

  • Macro · labor · council 34% → actual YES

    No single source bridged macro labor (FRED) with sector-specific guidance (technical lead/lag). Both features are HOLD on Brier-delta. Adding Polymarket-flavoured falsification may surface this signal.

  • Sovereign · council 9% → actual YES

    Eris (adversarial dissenter) raised the counter-case in Round 2 but with insufficient specificity. The vote-weight policy gave Eris 0.8x against Hades's 2x — adversarial dissent was outweighed by structural caution. Tuning Eris's weight upward in Q4 is on the open issues list.

Aggregate context: these 5 cases sit at Brier 0.43–0.83. The full sample mean is 0.149. So these are tail mis-calibrations, not representative cases. But they are the ones a critic should know about before forming an opinion.

§2 · Restraint-reversal queue

Refused trades that moved against the refusal.

Every time the council refuses a trade, the refusal is anchored on Arc Testnet as a Proof of Restraint record. When the underlying market later resolves, we can audit whether the refusal saved money or cost opportunity. This is the no-trade-alpha accountability ledger — and the most important measurement of the system.

Anchored restraints
20
Resolved + reversed
0
Reversal rate
0.0%

The queue is intentionally empty. None of the 20 anchored restraints have aged through resolution yet — most carry resolution dates 30+ days out. When they begin resolving, this section will populate automatically from on-chain log scans. No edits, no manual curation. If the reversal rate climbs above 50%, the system is worse than coin-flipping on what to refuse, and this page will say so prominently.

The methodology behind these numbers lives at /methodology. The raw artifact is on GitHub. If a row here is missing or wrong, open an issue.