The Safety Paradox: Why Voluntary AI Commitments Are Not Working and What Might Replace Them

In 2023, leading AI companies signed voluntary safety commitments. In 2024, the EU adopted legally binding GPAI provisions. In 2025, safety researchers began evaluating whether any of it was working. The evidence so far is not encouraging: voluntary commitments are inconsistently implemented, and legally binding requirements are not yet operational. The frontier AI safety regime is caught between two inadequate models — voluntary promises that lack enforcement and legal mandates that lack implementation.

Evaluating Safety Frameworks

Stelling et al. (2025), in a systematic evaluation, assess the frontier safety frameworks published by leading AI companies. These frameworks — internal documents that specify what safety evaluations the company will conduct, what thresholds will trigger additional precautions, and what governance structures oversee safety decisions — vary enormously in specificity, ambition, and accountability.

Some companies publish detailed capability evaluation protocols with clear escalation procedures. Others publish vague commitments to "responsible development" without specifying what that means operationally. The evaluation finds that the strongest frameworks include pre-deployment risk assessments, staged release protocols, external red-teaming, and documented decision-making processes. The weakest are essentially marketing documents that use safety language without creating enforceable obligations.

The variation raises a fundamental question about the voluntary approach: without external accountability, companies face incentives to publish frameworks that appear comprehensive while leaving sufficient flexibility to avoid constraining commercially valuable development paths.

Risk Management Frameworks

Campos et al. (2025) propose a structured frontier AI risk management framework that attempts to bridge the gap between voluntary and mandatory approaches. Their framework categorizes risks by type (capability risks, alignment risks, misuse risks, systemic risks), specifies evaluation methodologies for each category, and proposes governance structures that include both internal oversight and external review.

The framework's contribution is practical: it provides a detailed template that companies can adopt and regulators can use as a benchmark. Rather than prescribing specific technical requirements (which become outdated as technology evolves), it prescribes a process — risk identification, assessment, mitigation, monitoring, and reporting — that can accommodate technological change while maintaining accountability.

The Regulatory Question

Radanliev (2025), in Frontiers in Political Science, addresses the broader regulatory question: what form should frontier AI regulation take? The analysis considers several models — self-regulation (current US approach), comprehensive legislation (EU approach), international treaty (proposed but not realized), and regulatory agency (modeled on nuclear or aviation safety authorities).

Each model has limitations for frontier AI. Self-regulation lacks accountability. Comprehensive legislation lacks adaptability. International treaties require consensus that does not exist. Regulatory agencies require technical expertise that is concentrated in the companies being regulated. The most promising approach, Radanliev argues, involves a combination: mandatory transparency requirements (companies must disclose safety evaluations), independent audit mechanisms (external parties verify compliance), and adaptive regulatory powers (regulators can update requirements as technology evolves without new legislation).

The transition from voluntary to enforceable safety governance is the defining regulatory challenge for frontier AI. The technology is advancing faster than governance institutions can adapt, and the consequences of getting the transition wrong — either stifling beneficial innovation through excessive regulation or enabling harmful deployment through insufficient oversight — are substantial in both directions.

The Expertise Problem

A distinctive challenge for frontier AI safety governance is the concentration of relevant expertise. The people who best understand the capabilities and risks of frontier AI systems are the researchers who build them — and they work for the companies being regulated. This creates a structural information asymmetry that complicates every governance model.

Self-regulation suffers because the regulated entity controls the information needed to evaluate compliance. External regulation suffers because regulators lack the technical expertise to meaningfully oversee frontier AI development. Hybrid models — where companies disclose safety information and external parties audit it — partially address the asymmetry but require external auditors with sufficient expertise to evaluate complex AI systems, and this expertise is scarce.

The talent pipeline between AI companies and regulatory bodies needs to flow in both directions. Regulators who have never built a frontier AI system lack the intuitions needed to identify genuine risks versus performative safety theater. Researchers who have never worked in a regulatory context may not appreciate the institutional constraints and incentive structures that shape how regulation actually functions. Building a community of practice that spans both worlds is essential for effective frontier AI governance.

The timeline pressure compounds every governance challenge. Frontier AI capabilities are advancing on a trajectory measured in months. Governance institutions operate on timescales measured in years. By the time a regulatory framework is designed, consulted upon, legislated, and implemented, the technology it was designed to govern may have been superseded by systems with fundamentally different risk profiles. This temporal mismatch is not a temporary condition but a structural feature of governing a rapidly advancing technology, and any viable governance model must be designed for continuous adaptation rather than periodic revision.

The Safety Paradox: Why Voluntary AI Commitments Are Not Working and What Might Replace Them

Evaluating Safety Frameworks

Risk Management Frameworks

The Regulatory Question

The Expertise Problem

References (3)

Explore this topic deeper