Paper ReviewMathematics & StatisticsDesign Science Research

Lean-auto: Bridging Interactive Proof Assistants and Automated Theorem Provers

Lean 4 is the rising proof assistant for formalizing mathematics—but users must manually construct proofs tactic by tactic. Lean-auto connects Lean to external automated theorem provers, enabling one-click proof of routine goals that would otherwise require tedious manual construction.

By Sean K.S. Shin

This blog summarizes research trends based on published paper abstracts. Specific numbers or findings may contain inaccuracies. For scholarly rigor, always consult the original papers cited in each post.

Interactive theorem provers (ITPs) like Lean 4, Coq, and Isabelle enable mathematicians and computer scientists to construct machine-checked proofs of mathematical theorems and software correctness properties. The proofs are rigorous—every step is verified against the system's foundational logic—but the process of constructing them is painstaking. Users write proofs interactively, applying tactics one at a time, inspecting the resulting proof state, and deciding the next tactic. For complex proofs, this process can take hours or days of expert effort.

Automated theorem provers (ATPs) like E, Vampire, and Z3 take a different approach: given a logical formula, they search for a proof automatically, using resolution, superposition, and SMT (satisfiability modulo theories) techniques. ATPs are powerful for first-order logic but cannot directly handle the richer type theories that ITPs use.

Hammers bridge this gap: they translate goals from the ITP's type theory into first-order logic, send them to ATPs, and translate the proofs back. Isabelle's Sledgehammer has been the gold standard for years—handling routine proof obligations automatically and freeing users to focus on the creative aspects of proof construction.

Lean 4, despite its rapid adoption, has lacked a mature hammer. Qian et al.'s Lean-auto fills this gap, providing the first production-quality interface between Lean 4 and external ATPs.

The Lean-auto Architecture

Lean-auto operates as a Lean 4 tactic that, when invoked, performs the following steps:

Goal extraction: Reads the current proof state—the goal to be proved and the available hypotheses

Translation: Converts the Lean 4 goal (expressed in dependent type theory) into first-order logic, handling the type-theoretic features (dependent types, universe polymorphism, type classes) that have no direct first-order counterpart

Premise selection: Identifies relevant lemmas from Lean's Mathlib library that might be useful for the proof, using machine learning-based relevance ranking

ATP invocation: Sends the translated goal and selected premises to one or more ATPs (E, Vampire, Z3, CVC5)

Proof reconstruction: If an ATP finds a proof, Lean-auto translates it back into a Lean 4 tactic script that Lean's kernel can verify

The translation step is the most technically challenging. Lean 4's type theory is strictly more expressive than first-order logic—some goals cannot be translated faithfully. Lean-auto handles this through a combination of encoding techniques (defunctionalization for higher-order functions, monomorphization for polymorphism) and graceful fallback (returning "unsolved" for goals that resist translation).

Machine Learning for Premise Selection

Piepenbrock's work on ML-guided ATP provides the theoretical foundation for Lean-auto's premise selection step. The Mathlib library contains over 200,000 theorems (as of 2025). Sending all of them to the ATP would overwhelm it; sending too few might omit the critical lemma. ML-based premise selection—trained on the corpus of existing Lean proofs—predicts which lemmas are most likely to be useful for a given goal.

The ML model learns from the history of proof construction: when mathematicians proved similar goals in the past, which lemmas did they use? This historical data encodes expert mathematical judgment about lemma relevance—judgment that the ML model distills into a fast, automated relevance ranking.

Claims and Evidence

Claim	Evidence	Verdict
Lean-auto provides functional hammer capability for Lean 4	Integration with Mathlib demonstrated	✅ Demonstrated
Translation from dependent type theory to FOL is feasible	Encoding techniques handle common patterns	✅ Supported (with limitations)
ML-based premise selection improves ATP success rate	Consistent finding in Isabelle/Sledgehammer literature	✅ Well-established
Lean-auto matches Sledgehammer's capability	Lean-auto is newer and less mature	⚠️ Approaching but not yet equivalent
Hammers eliminate the need for manual proof construction	Hammers handle routine goals; creative proof steps require human insight	⚠️ Complements, not replaces

Open Questions

Coverage: What fraction of Mathlib proof goals can Lean-auto solve automatically? Sledgehammer handles a significant fraction of Isabelle goals in benchmark evaluations. What is Lean-auto's success rate on comparable Lean 4 goals?

Speed: ATP invocation adds latency to the proof development workflow. Can Lean-auto provide sub-second responses for the majority of solvable goals?

Proof quality: ATP-generated proofs, translated back to Lean, may be longer and less readable than hand-written proofs. Can the reconstruction step produce concise, idiomatic Lean proofs?

Integration with AI provers: Can Lean-auto integrate with LLM-based provers (Goedel-Prover, Seed-Prover) in addition to traditional ATPs? The combination of ATP rigor with LLM creativity might handle a broader range of goals.

What This Means for Your Research

For Lean 4 users (an increasingly large community including mathematicians, computer scientists, and verification engineers), Lean-auto is an immediate productivity tool. Routine proof obligations that previously required manual tactic construction can now be dispatched with a single command.

For formal verification researchers, Lean-auto brings Lean 4 closer to parity with Isabelle in terms of automation infrastructure—a critical factor for Lean's adoption in industrial verification projects where proof engineer productivity determines project feasibility.

면책 조항: 이 게시물은 정보 제공을 목적으로 한 연구 동향 개요이다. 학술 저작물에서 인용하기 전에 구체적인 연구 결과, 통계 및 주장은 원본 논문을 통해 확인해야 한다.

Lean-auto: 대화형 증명 보조기와 자동 정리 증명기의 연결

Lean 4, Coq, Isabelle과 같은 대화형 정리 증명기(ITP)는 수학자와 컴퓨터 과학자가 수학 정리 및 소프트웨어 정확성 속성에 대한 기계 검증 증명을 구성할 수 있게 해준다. 증명은 엄밀하다—모든 단계가 시스템의 기초 논리에 대해 검증된다—그러나 증명을 구성하는 과정은 매우 힘들다. 사용자는 대화형으로 증명을 작성하며, 한 번에 하나씩 전술(tactic)을 적용하고, 결과로 나타나는 증명 상태를 검사한 후, 다음 전술을 결정한다. 복잡한 증명의 경우 이 과정에 전문가의 노력이 몇 시간 또는 며칠씩 소요될 수 있다.

E, Vampire, Z3 같은 자동 정리 증명기(ATP)는 다른 접근 방식을 취한다. 주어진 논리 공식에 대해 귀납(resolution), 중첩(superposition), SMT(satisfiability modulo theories) 기법을 사용하여 자동으로 증명을 탐색한다. ATP는 1차 논리(first-order logic)에 강력하지만, ITP가 사용하는 더 풍부한 유형 이론(type theory)을 직접 처리할 수 없다.

해머(Hammer)는 이 간극을 연결한다. ITP의 유형 이론에서 목표를 1차 논리로 변환하고, ATP로 전송한 후, 증명을 다시 변환한다. Isabelle의 Sledgehammer는 수년간 표준으로 자리 잡아 왔으며, 일상적인 증명 의무를 자동으로 처리하고 사용자가 증명 구성의 창의적인 측면에 집중할 수 있게 해준다.

Lean 4는 빠른 도입에도 불구하고 성숙한 해머가 부재했다. Qian 등의 Lean-auto는 이 간극을 채우며, Lean 4와 외부 ATP 사이의 첫 번째 프로덕션 품질 인터페이스를 제공한다.

Lean-auto 아키텍처

Lean-auto는 Lean 4 전술로 작동하며, 호출 시 다음 단계를 수행한다:

목표 추출: 현재 증명 상태—증명할 목표와 사용 가능한 가설—를 읽는다

변환: Lean 4 목표(의존 유형 이론으로 표현됨)를 1차 논리로 변환하며, 1차 논리에 직접적인 대응물이 없는 유형 이론적 특성(의존 유형, 우주 다형성, 유형 클래스)을 처리한다

전제 선택: 기계 학습 기반 관련성 순위를 사용하여 증명에 유용할 수 있는 Lean의 Mathlib 라이브러리에서 관련 보조정리(lemma)를 식별한다

ATP 호출: 변환된 목표와 선택된 전제를 하나 이상의 ATP(E, Vampire, Z3, CVC5)로 전송한다

증명 재구성: ATP가 증명을 찾으면, Lean-auto는 이를 Lean의 커널이 검증할 수 있는 Lean 4 전술 스크립트로 다시 변환한다

변환 단계가 기술적으로 가장 어렵다. Lean 4의 유형 이론은 1차 논리보다 표현력이 엄격히 더 높아, 일부 목표는 충실하게 변환될 수 없다. Lean-auto는 인코딩 기법(고차 함수를 위한 비함수화(defunctionalization), 다형성을 위한 단형화(monomorphization))과 점진적 폴백(변환에 저항하는 목표에 대해 "미해결"을 반환)의 조합을 통해 이를 처리한다.

전제 선택을 위한 기계 학습

Piepenbrock의 ML 기반 ATP에 관한 연구는 Lean-auto의 전제 선택 단계에 대한 이론적 기반을 제공한다. Mathlib 라이브러리는 2025년 기준으로 200,000개 이상의 정리를 포함하고 있다. 이 모두를 ATP에 전송하면 과부하가 걸리고, 너무 적게 전송하면 핵심 보조정리가 누락될 수 있다. 기존 Lean 증명 말뭉치로 훈련된 ML 기반 전제 선택은 주어진 목표에 대해 어떤 보조정리가 가장 유용할지 예측한다.

ML 모델은 증명 구성의 이력으로부터 학습한다. 수학자들이 과거에 유사한 목표를 증명할 때 어떤 보조정리를 사용했는가? 이 역사적 데이터는 보조정리 관련성에 대한 전문가적 수학적 판단을 인코딩하며, ML 모델은 이 판단을 빠르고 자동화된 관련성 순위로 정제한다.

주장과 근거

주장	근거	판정
Lean-auto는 Lean 4에 기능적인 해머 기능을 제공한다	Mathlib와의 통합이 입증됨	✅ 입증됨
의존 타입 이론에서 FOL로의 변환은 실현 가능하다	인코딩 기법이 일반적인 패턴을 처리한다	✅ 지원됨 (제한 있음)
ML 기반 전제 선택이 ATP 성공률을 향상시킨다	Isabelle/Sledgehammer 문헌에서 일관되게 발견되는 결과이다	✅ 충분히 확립됨
Lean-auto가 Sledgehammer의 역량에 필적한다	Lean-auto는 더 최신이며 성숙도가 낮다	⚠️ 근접하고 있으나 아직 동등하지 않음
Hammer가 수동 증명 구성의 필요성을 없앤다	Hammer는 일상적인 목표를 처리하며, 창의적인 증명 단계는 인간의 통찰이 필요하다	⚠️ 대체가 아닌 보완

미해결 질문

커버리지: Lean-auto가 자동으로 해결할 수 있는 Mathlib 증명 목표의 비율은 얼마인가? Sledgehammer는 벤치마크 평가에서 Isabelle 목표의 상당 부분을 처리한다. 비교 가능한 Lean 4 목표에 대한 Lean-auto의 성공률은 얼마인가?

속도: ATP 호출은 증명 개발 워크플로에 지연 시간을 추가한다. Lean-auto는 해결 가능한 목표의 대부분에 대해 1초 미만의 응답을 제공할 수 있는가?

증명 품질: Lean으로 역번환된 ATP 생성 증명은 수작업으로 작성된 증명보다 길고 가독성이 낮을 수 있다. 재구성 단계에서 간결하고 관용적인 Lean 증명을 생성할 수 있는가?

AI 증명기와의 통합: Lean-auto는 전통적인 ATP 외에도 LLM 기반 증명기(Goedel-Prover, Seed-Prover)와 통합될 수 있는가? ATP의 엄밀성과 LLM의 창의성을 결합하면 더 넓은 범위의 목표를 처리할 수 있을 것이다.

연구에 주는 시사점

Lean 4 사용자(수학자, 컴퓨터 과학자, 검증 엔지니어를 포함하여 점점 커지는 커뮤니티)에게 Lean-auto는 즉각적인 생산성 도구이다. 이전에 수동 택틱 구성이 필요했던 일상적인 증명 의무는 이제 단일 명령으로 처리할 수 있다.

형식 검증 연구자에게 Lean-auto는 자동화 인프라 측면에서 Lean 4를 Isabelle과의 동등성에 더 가깝게 만든다. 이는 증명 엔지니어의 생산성이 프로젝트 실현 가능성을 결정하는 산업용 검증 프로젝트에서 Lean의 도입을 위한 핵심 요소이다.

References (2)

[1] Qian, Y., Clune, J., Barrett, C., & Avigad, J. (2025). Lean-auto: An Interface between Lean 4 and Automated Theorem Provers. arXiv:2505.14929.

DOI Scholar

[2] Piepenbrock, J. (2025). Guiding Automated Theorem Proving with Machine Learning.