← all metas

Monday, April 13, 2026

Claude Opus 4.6 plummets on BridgeBench after retest — hallucination rate nearly doubles and lab rivalry ignites

Claude Opus 4.6 nerfed on BridgeBench as Grok takes #1

CLAUDE OPUS 4.6 IS NERFED. BridgeBench just proved it. Last week Claude Opus 4.6 ranked #2 on the Hallucination benchmark with an accuracy of 83.3%. Today Claude Opus 4.6 was retested and it fell to #10 on the leaderboard with an accuracy of only 68.3%. A 98% increase in hallucination.

Also dominant that day

  • @theo Agent harnesses demystified (Theo builds one live) — Theo builds an agent harness on camera to prove the architecture isn't magic — builders pile on in agreement
  • @NousResearch Hermes Agent v0.9.0 Everywhere Release — Nous Research ships Hermes Agent v0.9.0 with local dashboard and Android support — open-source agent crowd lights up