Claude Opus 4.6 plummets on BridgeBench after retest — hallucination rate nearly doubles and lab rivalry ignites
CLAUDE OPUS 4.6 IS NERFED. BridgeBench just proved it. Last week Claude Opus 4.6 ranked #2 on the Hallucination benchmark with an accuracy of 83.3%. Today Claude Opus 4.6 was retested and it fell to #10 on the leaderboard with an accuracy of only 68.3%. A 98% increase in hallucination.
Also dominant that day
- @theo — Agent harnesses demystified (Theo builds one live) — Theo builds an agent harness on camera to prove the architecture isn't magic — builders pile on in agreement
- @NousResearch — Hermes Agent v0.9.0 Everywhere Release — Nous Research ships Hermes Agent v0.9.0 with local dashboard and Android support — open-source agent crowd lights up