Public accountability ledger.
A public list of cases where the council's probability estimate was meaningfully wrong, and cases where the council refused trades that later moved against the refusal. No filtering. No cherry-picking. If you can't see the failures, you can't trust the wins.
Where the 200-market backtest got it most wrong.
Top 5 cases ranked by (council_p − outcome)² from artifacts/sources_brier_20260517T181232Z.json. Each row says what the council predicted, what actually happened, and what specifically the council missed.
| Question | Council p | Actual | Brier |
|---|---|---|---|
Will the FTC successfully block the proposed acquisition by Q3 2026? Regulatory | 82% | NO | 0.672 |
Will US headline CPI print above 3.5% YoY in the May 2026 release? Macro | 71% | NO | 0.504 |
Will the SEC approve at least one spot Ethereum ETF by 2026-07-31? Crypto · regulatory | 18% | YES | 0.672 |
Will any major US tech firm announce 10,000+ layoffs in June 2026? Macro · labor | 34% | YES | 0.436 |
Will Argentina exit the IMF program before end of 2026 Q3? Sovereign | 9% | YES | 0.828 |
Regulatory · council 82% → actual NO
Council weighted historical FTC block rate (61%) without conditioning on the specific commissioner composition. Three of five commissioners had publicly stated pro-merger views during nomination hearings — a fact in the public record that no agent surfaced.
Macro · council 71% → actual NO
Energy basis collapse in late April was visible in real-time DEX swap volumes but Apollo's macro_basis feature is currently HOLD (insufficient sample for ADOPT). Council relied on the lagged FRED CPI nowcast.
Crypto · regulatory · council 18% → actual YES
Cassandra correctly flagged the catalyst window. Athena's synthesis weighted Hades's structural objection too heavily. The market was already pricing 0.62 — a 44 percentage point gap that we treated as edge in the wrong direction. Calibration on regulatory windows is a known weakness.
Macro · labor · council 34% → actual YES
No single source bridged macro labor (FRED) with sector-specific guidance (technical lead/lag). Both features are HOLD on Brier-delta. Adding Polymarket-flavoured falsification may surface this signal.
Sovereign · council 9% → actual YES
Eris (adversarial dissenter) raised the counter-case in Round 2 but with insufficient specificity. The vote-weight policy gave Eris 0.8x against Hades's 2x — adversarial dissent was outweighed by structural caution. Tuning Eris's weight upward in Q4 is on the open issues list.
Aggregate context: these 5 cases sit at Brier 0.43–0.83. The full sample mean is 0.149. So these are tail mis-calibrations, not representative cases. But they are the ones a critic should know about before forming an opinion.
Refused trades that moved against the refusal.
Every time the council refuses a trade, the refusal is anchored on Arc Testnet as a Proof of Restraint record. When the underlying market later resolves, we can audit whether the refusal saved money or cost opportunity. This is the no-trade-alpha accountability ledger — and the most important measurement of the system.
The queue is intentionally empty. None of the 20 anchored restraints have aged through resolution yet — most carry resolution dates 30+ days out. When they begin resolving, this section will populate automatically from on-chain log scans. No edits, no manual curation. If the reversal rate climbs above 50%, the system is worse than coin-flipping on what to refuse, and this page will say so prominently.
The methodology behind these numbers lives at /methodology. The raw artifact is on GitHub. If a row here is missing or wrong, open an issue.