Article image
brinsa.com

The Safety Plan That Eats Itself - Why “Use AI to Make AI Safe” turns into a leadership test during crunch time

markus brinsa 2 february 23, 2026 4 4 min read create pdf website all articles

Sources

The plan everyone repeats because it sounds scalable

Every major AI company ends up circling the same idea: when systems become dangerously capable, the systems themselves will help solve alignment, security, and control. The pitch is attractive because it scales. It’s also unsettling because it’s circular. The tool that creates the risk is also the tool you’re counting on to manage it.

In the Feb 17, 2026 episode of the 80,000 Hours podcast, Ajeya Cotra treats this not as a meme and not as a morality play, but as a timing problem. If we enter a period where AI can automate meaningful chunks of AI research, the world may get a narrow “crunch time” window in which the core task is to redirect vast amounts of AI labor away from capability and toward safety, security, and societal hardening. She frames that window as potentially very short, on the order of a year or less.

That is the first executive takeaway: this is not a debate about whether safety work matters. It’s a question of whether safety work can be made to move at the same rate as capability work when the incentives are screaming in the opposite direction.

The safety-plan paradox and how it quietly creates risk

The safety-plan paradox is what happens when the existence of a plan becomes evidence that the risk is handled.

Plans are helpful. They create shared language, funding pathways, and accountability structures. But in fast-moving systems, a plan can also become a psychological product. It reduces internal anxiety and external scrutiny. It becomes something you can point to. And then it starts replacing the harder thing: runtime control, continuous validation, and the willingness to delay release when conditions are hostile.

The paradox isn’t that people are dishonest. The paradox is that planning is rewarded immediately, while safety outcomes are probabilistic, delayed, and embarrassing to measure. You can ship a plan this quarter. You can only prove safety under adversarial use over time, and “over time” is exactly what crunch time steals from you.

Why “use AI to make AI safe” is not a safety strategy by itself

Cotra’s framing makes the circular plan feel less absurd and more incomplete. If you imagine a world where automated AI R&D accelerates progress, the only reason “AI for safety” helps is because it can increase safety throughput. It can assist with evaluation, monitoring, red-teaming, vulnerability discovery, interpretability work, defensive cyber, and rapid iteration on safeguards. The point is leverage.

But leverage is not governance. Leverage does not answer who decides what “safe enough” means, who has veto power, and what happens when a commercial roadmap collides with an ugly safety finding.

That’s where the plan starts to eat itself. The more you rely on AI to keep up, the more you create pressure to deploy the most capable systems earlier, because those are the systems that supposedly make the safety work possible.

The failure mode nobody budgets for

The most executive-relevant warning in this episode is not a technical limitation. It’s follow-through.

Even if there are clever techniques that make it feasible to use advanced AI systems to improve safety, the plan can still fail because organizations don’t make measurable, binding commitments about what they will actually do when crunch time arrives. Cotra and Rob Wiblin call out the absence of quantitative commitments about what fraction of AI labor will be redirected toward alignment and security when things get urgent, and the possibility that competitive pressure during recursive improvement becomes irresistible.

This is where governance stops being a deck and becomes a control plane. In real companies, “we will prioritize safety later” is not a plan. It’s an intention waiting to be renegotiated under stress.

What serious safety leadership looks like in this framing

If you take the episode seriously, the practical implications for leaders are uncomfortable and actionable.

A credible safety posture treats the plan as disposable and the instrumentation as sacred. The plan will be wrong, because the system will drift. The threat surface will mutate. The deployment context will surprise you. What matters is whether you can detect drift, detect misuse, and detect capability jumps early enough to change behavior.

And credibility comes from decision rights. If nobody has the authority to slow deployment when validation fails, you do not have governance. You have documentation with a nice cover slide.

In crunch time, the best safety plan is the one that already converted “later” into concrete triggers, budgets, and vetoes before incentives get loud.

The strategic irony inside the circular plan

There’s a second paradox hiding underneath the first. Using AI to improve safety is arguably necessary if safety work needs to scale fast. But it also increases the strategic stakes around who controls the safety tooling, who defines the benchmarks, and who gets to announce success. If safety becomes part of competitive positioning, the temptation to grade your own exam goes up, not down.

So the real executive question becomes: do you have an independent way to validate safety claims, under adversarial assumptions, and do you have governance mechanisms that survive competitive pressure?

Because in the world Cotra is describing, safety is not a communications problem. It’s a time-compressed organizational integrity problem.

The leadership takeaway

“Use AI to make AI safe” is not crazy because it’s circular. It’s dangerous because it’s easy to mistake for a complete plan.

The lesson from this episode is that safety must be built as an operating system: continuous evaluation, runtime enforcement, early warning signals, and decision rights that can halt or redirect work when conditions change.

If your safety story cannot survive crunch time incentives, it was never a safety story. It was a comfort story.

About the Author

Markus Brinsa is the Founder & CEO of SEIKOURI Inc., an international strategy firm that gives enterprises and investors human-led access to pre-market AI—then converts first looks into rights and rollouts that scale. As an AI Risk & Governance Strategist, he created "Chatbots Behaving Badly," a platform and podcast that investigates AI’s failures, risks, and governance. With over 30 years of experience bridging technology, strategy, and cross-border growth in the U.S. and Europe, Markus partners with executives, investors, and founders to turn early signals into a durable advantage.

©2026 copyright by markus brinsa | brinsa.com™