Anthropic Revises Safety Policy Amid AI Race, Dropping Key Commitments

Anthropic has updated its Responsible Scaling Policy, removing explicit promises to pause AI development if safety risks become unmanageable. The shift reflects industry pressures and a focus on competitive advancement over rigid safeguards.

Anthropic’s approach to AI development has quietly shifted, marking another pivot in the industry’s balance between innovation and caution. The company, once noted for its strong stance on safety, has revised its Responsible Scaling Policy (RSP) to remove a core commitment: halting model training if risks outpace safeguards.

Under previous versions of the RSP, Anthropic pledged to pause development when AI systems approached dangerous capability thresholds—particularly in areas like catastrophic misuse. This was framed as an implicit requirement to stop scaling until safety procedures could be adequately implemented. However, Version 3.0 of the policy replaces this language with broader, less binding terms centered on 'responsible development,' 'risk management,' and 'iterative deployment.'

The revised policy no longer ties development directly to halting progress when risks escalate. Instead, it emphasizes safeguards, public safety evaluations, and updates to a Frontier Safety Framework—though without the same enforceable constraints as before.

Industry analysts suggest this change reflects a broader trend where competitive pressure outweighs unilateral safety commitments. If one company pauses development while others advance without strong mitigations, the argument goes, the result could be a less safe landscape overall. Anthropic’s chief science officer has noted that rapid AI advancement makes such commitments difficult to sustain unilaterally.

While the updated policy introduces new transparency measures—such as publicly shareable roadmaps and risk reports—it falls short of its earlier promise to halt development in high-risk scenarios. The shift raises questions about whether industry-wide safety can be achieved without mandatory pauses, even if individual companies remain committed to risk management.

Key Changes:
Removal of explicit 'pause' commitments for model training when risks escalate
Focus on 'responsible development' and iterative deployment over rigid halts
New emphasis on public safety evaluations and risk transparency
Retention of Frontier Safety Framework but without enforceable thresholds

The move underscores the tension between advancing AI capabilities and maintaining safety standards. As competitors accelerate development, the question remains whether voluntary measures will suffice—or if industry-wide coordination is needed to prevent a race to the bottom in AI safety.

TECHOLAM

Anthropic Revises Safety Policy Amid AI Race, Dropping Key Commitments

Key takeaways