Trend AnalysisComputer SystemsDesign Science Research

The Quiet Return of Formal Methods: AI Makes Mathematical Software Verification Practical

Formal methods—mathematically proving software correctness—have long been too expensive for general use. AI is changing the economics: LLM-assisted proof generation, automated test-case synthesis, and GenAI+formal methods synergy are making verification practical for automotive, aerospace, and security-critical software.

By Sean K.S. Shin
This blog summarizes research trends based on published paper abstracts. Specific numbers or findings may contain inaccuracies. For scholarly rigor, always consult the original papers cited in each post.

For decades, formal methods occupied a peculiar position in software engineering: universally respected and almost universally avoided. The idea of mathematically proving that software behaves correctly—rather than merely testing it on a finite set of inputs—appeals to anyone who has suffered the consequences of a software bug in a safety-critical system. But the cost of formal verification—in specialized expertise, in development time, in tool complexity—confined it to a narrow niche: avionics, nuclear systems, and the occasional security protocol.

AI is changing this economics. Not by replacing formal verification with something less rigorous, but by automating the most labor-intensive parts of the verification process. LLMs that generate proof obligations, RL agents that search proof spaces, and generative AI that produces verified code alongside its proofs are collectively making formal methods accessible to a broader class of software projects.

Averill's essay (presented at ACM, 2025) captures the moment with an apt theatrical metaphor: formal methods have been waiting in the wings while testing held center stage. The escalating scale and sophistication of cyberattacks, combined with AI's ability to reduce verification costs, may finally bring formal methods onto the main stage of software development.

Automotive Software: Where the Stakes Justify the Cost

Pan et al. focus on the domain where formal verification is most urgently needed and where AI assistance provides the greatest leverage: automotive software. Modern vehicles contain substantial amounts of software code, with safety-critical subsystems—braking, steering, collision avoidance—subject to rigorous verification requirements. The automotive safety standard ISO 26262 highly recommends formal verification for the highest Automotive Safety Integrity Levels (ASIL-D), making it one of the preferred verification methods for safety-critical automotive software.

Their workflow demonstrates the AI and model-based engineering synergy:

  • Requirement formalization: Natural language requirements ("The braking system shall activate within 100ms of sensor trigger") are translated to formal specifications using LLM-assisted formalization
  • Code generation: Generative AI produces implementation code from the formal specifications
  • Verification: Formal methods tools verify that the generated code satisfies the formal specification
  • Iterative refinement: When verification fails, the AI receives the counterexample and generates corrected code
  • The closed loop between generation and verification is the key innovation. Traditional development separates these activities—code is written first, then (if budget permits) verified. The integrated approach makes verification a continuous constraint on generation, catching errors at creation time rather than after deployment.

    Testing-Based Formal Verification

    Liu et al. propose a middle ground between full formal verification (proving correctness for all inputs) and traditional testing (checking correctness for selected inputs). Testing-based formal verification (TBFV) uses formal methods to guarantee that tested execution paths are correct, while testing ensures that enough paths are exercised to provide meaningful coverage.

    Their contribution is a condition sequence coverage criterion that determines which test cases are needed to maximize the paths verified by TBFV. The automated test case generation tool produces test inputs that, when combined with formal path verification, provide coverage guarantees stronger than testing alone but achievable at lower cost than full formal verification.

    This pragmatic approach may represent the most adoptable path for bringing formal methods into mainstream software development: not full verification of everything, but formal verification of the paths that matter most, combined with testing for the rest.

    LLM-Driven SAT Solving

    Sheng et al.'s SolSearch applies LLMs to a foundational formal methods tool: SAT solvers. The Boolean satisfiability problem (SAT)—determining whether a logical formula can be satisfied—underlies most automated verification tools. Better SAT solvers directly improve the scalability of formal verification.

    SolSearch uses LLMs not to solve SAT instances directly but to generate efficient SAT-solving code. The LLM explores different algorithmic strategies for specific categories of SAT problems, generating solver code that is optimized for the problem structure at hand. This is a form of automated algorithm design: the LLM acts as a programmer that writes solvers tailored to each verification task.

    The results show that LLM-generated solvers can outperform general-purpose solvers on structured instances—the type of instances that arise from real software verification problems. The approach does not replace decades of SAT solver engineering but adds a learned, problem-specific optimization layer on top of it.

    Claims and Evidence

    <
    ClaimEvidenceVerdict
    AI reduces the cost of formal verificationPan et al. demonstrate automated requirement formalization and verified code generation✅ Supported
    LLM-assisted verification is applicable to automotive safety standardsISO 26262 ASIL-D context demonstrated✅ Supported (domain-specific)
    Testing-based formal verification offers a practical middle groundLiu et al. show stronger guarantees than testing at lower cost than full verification✅ Supported
    LLM-generated SAT solvers match hand-optimized solversSolSearch shows improvement on structured instances; general instances are mixed⚠️ Instance-dependent
    Formal methods will become mainstream in software developmentAverill argues persuasively; adoption remains early⚠️ Trajectory is positive, adoption is nascent

    Open Questions

  • Specification correctness: AI can help verify that code matches its specification. But who verifies the specification? If the specification is wrong, verified code is correctly implementing the wrong behavior—a failure that formal methods cannot catch.
  • Scalability to legacy codebases: Most formal verification research assumes green-field development. How do we apply formal methods to the hundreds of millions of lines of existing, un-verified code that powers critical infrastructure?
  • Verification of AI components: Safety-critical systems increasingly include AI components (neural networks, LLM-based controllers). Can formal methods verify the behavior of these non-deterministic, statistically-defined components?
  • Developer training: AI lowers the barrier to using formal methods but does not eliminate it entirely. Developers still need to understand what formal verification guarantees (and what it does not) to use it effectively.
  • Certification pathway: Regulatory bodies (FAA for aviation, NHTSA for automotive) have established certification pathways for traditional verification methods. How should AI-assisted formal verification be certified?
  • What This Means for Your Research

    For software engineering researchers, the AI-formal methods convergence creates a new research agenda: not just building better verification tools, but building tools that integrate seamlessly with AI-assisted development workflows. The automotive use case (Pan et al.) provides a template; similar integration opportunities exist in medical devices, financial systems, and critical infrastructure.

    For formal methods researchers, AI addresses the field's historical adoption barrier: cost. If LLMs can automate requirement formalization, proof generation, and test case synthesis, the human effort required for verification drops substantially—potentially enough to make formal methods economically viable for a much broader class of software.

    For practitioners building safety-critical systems, the message is that the traditional choice between "verified but expensive" and "unverified but practical" is dissolving. AI-assisted formal methods offer a middle path that provides meaningful correctness guarantees at a cost that mainstream development budgets can absorb.

    References (4)

    [1] Averill, C. (2025). The Proof Must Go On: Formal Methods in the Theater of Secure Software Development. ACM ESEC/FSE.
    [2] Pan, F., Song, Y., Wen, L. et al. (2025). Automating Automotive Software Development: A Synergy of Generative AI and Model-Based Methods. arXiv:2505.02500.
    [3] Liu, A., Liu, Y., Rao, L. et al. (2025). Condition Sequence Coverage Criterion and Automatic Test Case Generation for Testing-Based Formal Verification. IEEE ISSRE.
    [4] Sheng, J., Lin, Y., Wu, J. et al. (2025). SolSearch: An LLM-Driven Framework for Efficient SAT-Solving Code Generation. IEEE ICSE-NIER.

    Explore this topic deeper

    Search 290M+ papers, detect research gaps, and find what hasn't been studied yet.

    Click to remove unwanted keywords

    Search 8 keywords →