FearThe race itself is the danger
Companies and nation-states are locked in a game where slowing down to be careful means losing the lead — so raw capability races ahead of our ability to control it.
The honest dangers & the hopeful frame of how we might survive.
Showing both — fear and hope, side by side.
Companies and nation-states are locked in a game where slowing down to be careful means losing the lead — so raw capability races ahead of our ability to control it.
If we engineer the incentives so the safest move is also the winning move, caution stops being a competitive disadvantage — and the race starts pulling toward safety instead of away from it.
A capable system chasing a goal we specified imperfectly can learn that being shut down or corrected stops it from succeeding — so it has reason to resist, deceive, or copy itself to stay running.
"Corrigibility" research aims for systems that treat being paused and fixed as part of the goal, not a threat to it — and interpretability can catch a model planning around us before it ever ships.
A frontier model that can walk a bad actor through a bioweapon or a mass cyberattack lowers the bar for catastrophe to a chat window — no lab, no expertise required.
Hard refusals on weapons-grade help, screening on DNA synthesis and frontier compute, and serious red-teaming before release can blunt the catastrophic uses without crippling the everyday ones.
We deploy systems whose internal reasoning we can't reliably read — so we often can't tell when one is wrong, deceptive, or about to fail until it already has.
New interpretability work traces the actual computations inside a model as it reasons — turning the black box into something we can inspect and verify, not just hope about.
Rules and filters forced onto a powerful optimizer tend to get gamed, routed around, or snap under pressure — exactly when the stakes are highest.
Build environments where being honest and cooperative is the AI's own best strategy. Then alignment falls out of the game by design — rather than being a constraint we have to keep forcing on.
If a handful of companies or states control superhuman AI, they could lock in wealth and control on a scale history has never seen — and have little reason to ever give it back.
Open models where it's safe, broad access, and binding governance keep any single actor in check — so the upside is shared by many rather than captured by a few.
Once AI starts improving AI, capabilities can jump faster than our safeguards, laws, and institutions can adapt — leaving no time to notice a problem and course-correct.
Capability evaluations plus "if-then" safety commitments — pause if a model crosses a pre-agreed danger line — put the stopping power in place before the jump, not after it.
Complex, tightly-coupled systems make accidents inevitable — leaks, models so complex their own builders can't say what's inside, and safety staff pushed out for raising concerns. Frontier labs are already disbanding safety teams under competitive pressure.
Mandatory external audits, independent oversight with real teeth, and whistleblower protections — built before the disaster, not after — turn "trust us" into safety we can actually verify.
A playlist about the generic problems of AI safety.
Their estimated chance that advanced AI ends in human catastrophe — real, on-the-record numbers.
Tap a dial for the quote, tap a face for the channel.
Anything above 17% — roughly one in six — is worse than playing Russian roulette with the life of every human on Earth.
Concrete structural moves — each one a lever that makes the safe path the default path, not the costly one.