Deep DiveBiology & Life SciencesMachine/Deep Learning

RFdiffusion3: All-Atom Protein Design at 10x Speed

RFdiffusion3 from the Institute for Protein Design at UW enables de novo design of all-atom biomolecular interactions, operating approximately 10x faster than its predecessor and outperforming on 37 of 41 enzyme scaffold benchmarks by inverting AlphaFold3's prediction framework into a generative model.

By Sean K.S. Shin

This blog summarizes research trends based on published paper abstracts. Specific numbers or findings may contain inaccuracies. For scholarly rigor, always consult the original papers cited in each post.

Disclaimer: This post is a research trend overview for informational purposes. Specific findings, statistics, and claims should be verified against the original papers before citation in academic work.

RFdiffusion3: All-Atom Protein Design at 10x Speed

Protein design has historically operated at two levels of resolution. At the backbone level, designers specify the overall fold—the arrangement of alpha-helices, beta-sheets, and loops that define a protein's shape. At the all-atom level, every atom's position matters: the precise geometry of a catalytic site, the hydrogen bonding network at a protein-protein interface, the placement of water molecules that mediate binding. Most computational design tools have worked at the backbone level, leaving all-atom detail to subsequent refinement steps that are slow, approximate, and often require manual intervention.

RFdiffusion3, from David Baker's Institute for Protein Design at the University of Washington, closes this gap. It is a generative model that designs proteins at all-atom resolution from the outset, producing complete atomic structures rather than backbone traces that must be elaborated. It does so approximately 10x faster than RFdiffusion2, and it outperforms its predecessor on 37 of 41 enzyme scaffold benchmarks.

From Prediction to Generation: Inverting AlphaFold3

The conceptual architecture of RFdiffusion3 is best understood as an inversion of AlphaFold3. AlphaFold3 takes a biomolecular system—protein sequences, nucleic acids, small molecules, ions—and predicts their three-dimensional arrangement. It is a structure prediction model: given components, output structure.

RFdiffusion3 runs this logic in reverse. It starts from a desired structural outcome—a binding interface, a catalytic geometry, a scaffolded functional site—and generates the protein sequence and structure that would produce it. This inversion leverages the same learned representations of biomolecular physics that make AlphaFold3 accurate at prediction, repurposed for generation through a diffusion framework.

The diffusion process works by iteratively denoising a random atomic cloud into a coherent structure. At each step, the model applies its understanding of physical constraints—bond angles, van der Waals radii, hydrogen bonding geometries, hydrophobic packing—to move atoms toward chemically and physically plausible positions. The "all-atom" designation is critical: unlike backbone-only diffusion, RFdiffusion3 models sidechain atoms, ligand atoms, and solvent-exposed surfaces simultaneously, producing designs that are closer to experimentally realizable structures without post-hoc refinement.

Performance: 37 of 41 Benchmarks

The preprint (bioRxiv 2025.09.18.676967) reports that RFdiffusion3 outperforms RFdiffusion2 on 37 of 41 enzyme scaffold design benchmarks. These benchmarks evaluate the model's ability to generate protein scaffolds that position catalytic residues in geometries compatible with enzymatic function.

Claim	Source	Confidence	Status
RFdiffusion3 enables de novo design of all-atom biomolecular interactions	bioRxiv 2025.09.18.676967 abstract	High	Stated in abstract
Approximately 10x faster than RFdiffusion2	bioRxiv 2025.09.18.676967 abstract	High	Stated in abstract
Outperforms on 37 of 41 enzyme scaffold benchmarks	bioRxiv 2025.09.18.676967 abstract	High	Stated in abstract
Inverts AlphaFold3 prediction framework into generative model	bioRxiv 2025.09.18.676967 abstract	High	Stated in abstract

Enzyme design is a demanding test case because catalytic function depends on sub-angstrom positioning of key residues. A designed enzyme that places a catalytic triad's residues even 0.5 angstroms from their optimal positions may show dramatically reduced activity. The fact that RFdiffusion3 outperforms on the vast majority of these benchmarks while operating at all-atom resolution suggests that the model has learned meaningful representations of the geometric requirements for enzyme function.

The four benchmarks where RFdiffusion2 still outperforms RFdiffusion3 deserve attention. Without detailed analysis of which specific enzyme geometries are involved, it is difficult to determine whether these represent systematic weaknesses in the new architecture or statistical noise in a comparison across 41 test cases.

The 10x Speed Improvement

The approximate 10x speed improvement over RFdiffusion2 has practical consequences that extend beyond convenience. Protein design workflows are iterative: designers generate many candidates, filter them computationally, synthesize the most promising ones, and test them experimentally. The experimental steps—gene synthesis, protein expression, purification, and functional assays—are slow and expensive. Computational design speed determines how many candidates can be generated and filtered before committing to experimental resources.

A 10x speedup means that the same computational budget produces an order of magnitude more candidate designs. In a field where the success rate of designed proteins—the fraction that fold correctly and function as intended when actually synthesized—remains well below 100%, generating more candidates per design cycle is a direct multiplier on the probability of finding functional designs.

The speed improvement also lowers the barrier to applying RFdiffusion3 to larger and more complex design problems. All-atom design of a 500-residue protein interacting with a small-molecule ligand, a metal cofactor, and a partner protein involves positioning thousands of atoms simultaneously. At RFdiffusion2 speeds, such problems might require days of GPU time per design; at 10x faster, they become tractable as routine design tasks.

All-Atom Design: Why It Matters

The transition from backbone-level to all-atom design addresses a persistent gap in the protein design pipeline. Previous workflows required a two-step process: first, design a backbone fold using RFdiffusion or similar tools; second, place sidechains and optimize their conformations using tools like Rosetta's packer or other sidechain placement algorithms. Each step introduced approximations, and errors in backbone design could not always be corrected by sidechain optimization.

All-atom design integrates these steps, allowing the model to jointly optimize backbone geometry and sidechain positioning. This is particularly important for designing protein-small molecule interactions, where the binding pocket geometry depends on precise sidechain arrangements, and for protein-protein interfaces, where complementarity extends to the atomic level.

Open Questions

Experimental validation rate: Computational outperformance on benchmarks does not guarantee improved experimental success rates. What fraction of RFdiffusion3's top-ranked designs fold correctly and function as intended when synthesized, and how does this compare to RFdiffusion2?

Dynamics and flexibility: All-atom design produces static structures, but proteins are dynamic. Does RFdiffusion3's all-atom accuracy extend to designing proteins whose function depends on conformational changes, allosteric regulation, or intrinsically disordered regions?

Small molecule generalization: The enzyme scaffold benchmarks test catalytic site geometry, but therapeutic protein design often involves designing interactions with drug-like small molecules that are not enzyme substrates. How well does the model generalize to these chemically diverse targets?

Accessibility and compute requirements: At what computational cost does RFdiffusion3 operate, and is the 10x speedup sufficient to make all-atom design accessible to academic labs without large GPU clusters?

The inversion of structure prediction into structure generation represents a conceptual advance in how the field uses deep learning for molecular design. By repurposing the physics learned through prediction tasks, RFdiffusion3 brings the protein design community closer to a workflow where atomic-level design intent can be directly expressed and realized.

면책 조항: 이 게시물은 정보 제공 목적의 연구 동향 개요이다. 특정 연구 결과, 통계 및 주장은 학술 연구에서 인용하기 전에 원본 논문을 통해 검증해야 한다.

RFdiffusion3: 10배 빠른 속도의 전원자 단백질 설계

단백질 설계는 역사적으로 두 가지 수준의 해상도에서 이루어져 왔다. 백본 수준에서 설계자는 전체적인 폴드, 즉 단백질의 형태를 정의하는 알파-나선, 베타-시트, 루프의 배열을 지정한다. 전원자(all-atom) 수준에서는 모든 원자의 위치가 중요하다. 즉, 촉매 부위의 정확한 기하학적 구조, 단백질-단백질 계면의 수소 결합 네트워크, 결합을 매개하는 물 분자의 배치 등이 해당된다. 대부분의 전산 설계 도구는 백본 수준에서 작동해 왔으며, 전원자 세부 사항은 느리고 근사적이며 종종 수동 개입이 필요한 후속 정제 단계에 맡겨 왔다.

워싱턴대학교 David Baker의 단백질 설계 연구소(Institute for Protein Design)에서 개발한 RFdiffusion3은 이 격차를 해소한다. 이 모델은 처음부터 전원자 해상도로 단백질을 설계하는 생성 모델로, 추가적인 정교화가 필요한 백본 궤적이 아닌 완전한 원자 구조를 생성한다. RFdiffusion3은 RFdiffusion2보다 약 10배 빠르게 동작하며, 41개의 효소 스캐폴드 벤치마크 중 37개에서 이전 모델을 능가한다.

예측에서 생성으로: AlphaFold3의 역전

RFdiffusion3의 개념적 아키텍처는 AlphaFold3의 역전(inversion)으로 이해하는 것이 가장 적절하다. AlphaFold3은 단백질 서열, 핵산, 소분자, 이온 등 생체분자 시스템을 입력받아 3차원 배열을 예측한다. 이는 구성 요소가 주어지면 구조를 출력하는 구조 예측 모델이다.

RFdiffusion3은 이 논리를 역으로 실행한다. 원하는 구조적 결과—결합 계면, 촉매 기하학적 구조, 스캐폴딩된 기능적 부위—에서 시작하여 그것을 만들어낼 단백질 서열과 구조를 생성한다. 이 역전은 AlphaFold3이 예측에서 정확성을 발휘하도록 하는 생체분자 물리학에 대한 동일한 학습된 표현을 활용하여, 확산(diffusion) 프레임워크를 통한 생성에 재활용한다.

확산 과정은 무작위 원자 구름을 일관된 구조로 반복적으로 노이즈 제거(denoising)함으로써 작동한다. 각 단계에서 모델은 결합각, van der Waals 반경, 수소 결합 기하학, 소수성 패킹 등 물리적 제약 조건에 대한 이해를 적용하여 원자를 화학적·물리적으로 타당한 위치로 이동시킨다. "전원자"라는 명칭은 핵심적이다. 백본 전용 확산과 달리, RFdiffusion3은 사이드체인 원자, 리간드 원자, 용매 노출 표면을 동시에 모델링함으로써, 사후 정제 없이도 실험적으로 구현 가능한 구조에 더 가까운 설계 결과를 생성한다.

성능: 41개 벤치마크 중 37개

프리프린트(bioRxiv 2025.09.18.676967)에 따르면, RFdiffusion3은 41개의 효소 스캐폴드 설계 벤치마크 중 37개에서 RFdiffusion2를 능가한다. 이 벤치마크들은 효소 기능에 적합한 기하학적 구조로 촉매 잔기를 배치하는 단백질 스캐폴드를 생성하는 모델의 능력을 평가한다.

주장	출처	신뢰도	상태
RFdiffusion3은 전원자 생체분자 상호작용의 드노보(de novo) 설계를 가능하게 한다	bioRxiv 2025.09.18.676967 초록	높음	초록에 명시됨
RFdiffusion2보다 약 10배 빠름	bioRxiv 2025.09.18.676967 초록	높음	초록에 명시됨
41개의 효소 스캐폴드 벤치마크 중 37개에서 성능 우위	bioRxiv 2025.09.18.676967 초록	높음	초록에 명시됨
AlphaFold3 예측 프레임워크를 생성 모델로 역전함	bioRxiv 2025.09.18.676967 초록	높음	초록에 명시됨

효소 설계는 촉매 기능이 핵심 잔기의 옹스트롬 이하 수준의 위치 결정에 의존하기 때문에 매우 까다로운 테스트 사례이다. 촉매 삼원체(catalytic triad)의 잔기를 최적 위치에서 0.5 옹스트롬만 벗어나게 배치한 설계된 효소도 활성이 극적으로 감소할 수 있다. RFdiffusion3가 전체 원자 해상도로 작동하면서 이러한 벤치마크의 대부분에서 우수한 성능을 보인다는 사실은, 이 모델이 효소 기능에 필요한 기하학적 요건에 대한 의미 있는 표현을 학습했음을 시사한다.

RFdiffusion2가 여전히 RFdiffusion3를 능가하는 네 가지 벤치마크도 주목할 필요가 있다. 어떤 특정 효소 기하학이 관련되어 있는지에 대한 상세한 분석 없이는, 이것이 새로운 아키텍처의 체계적 약점을 나타내는지 아니면 41개 테스트 케이스 비교에서의 통계적 노이즈인지 판단하기 어렵다.

10배 속도 향상

RFdiffusion2 대비 약 10배의 속도 향상은 단순한 편의성을 넘어 실질적인 영향을 미친다. 단백질 설계 워크플로우는 반복적이다. 설계자들은 많은 후보를 생성하고, 계산적으로 필터링하며, 가장 유망한 것을 합성하고, 실험적으로 검증한다. 유전자 합성, 단백질 발현, 정제, 기능 분석 등 실험 단계는 느리고 비용이 많이 든다. 계산 설계 속도는 실험 자원을 투입하기 전에 얼마나 많은 후보를 생성하고 필터링할 수 있는지를 결정한다.

10배의 속도 향상은 동일한 계산 예산으로 한 자릿수 더 많은 후보 설계를 생성할 수 있음을 의미한다. 설계된 단백질의 성공률, 즉 실제로 합성했을 때 올바르게 접히고 의도한 대로 기능하는 비율이 여전히 100%를 크게 밑도는 분야에서, 설계 주기당 더 많은 후보를 생성하는 것은 기능적 설계를 찾을 확률을 직접적으로 높여준다.

속도 향상은 RFdiffusion3를 더 크고 복잡한 설계 문제에 적용하는 장벽도 낮춘다. 소분자 리간드, 금속 보조인자, 파트너 단백질과 상호작용하는 500개 잔기 단백질의 전체 원자 설계는 수천 개의 원자를 동시에 배치하는 작업이다. RFdiffusion2의 속도로는 설계당 GPU 시간이 며칠씩 걸릴 수 있는 이러한 문제들이, 10배 빠른 속도에서는 일상적인 설계 작업으로 다룰 수 있게 된다.

전체 원자 설계: 왜 중요한가

백본 수준에서 전체 원자 설계로의 전환은 단백질 설계 파이프라인에서 지속적으로 존재했던 격차를 해소한다. 기존 워크플로우는 두 단계 과정을 필요로 했다. 첫째, RFdiffusion 또는 유사한 도구를 사용하여 백본 폴드를 설계하고, 둘째, Rosetta의 패커(packer)나 다른 사이드체인 배치 알고리즘을 사용하여 사이드체인을 배치하고 형태를 최적화하는 것이다. 각 단계마다 근사값이 도입되었으며, 백본 설계에서의 오류가 사이드체인 최적화로 항상 수정될 수는 없었다.

전체 원자 설계는 이 단계들을 통합하여 모델이 백본 기하학과 사이드체인 위치 결정을 함께 최적화할 수 있도록 한다. 이는 결합 포켓 기하학이 정밀한 사이드체인 배열에 의존하는 단백질-소분자 상호작용 설계와, 상보성이 원자 수준까지 확장되는 단백질-단백질 인터페이스 설계에 특히 중요하다.

미해결 질문

실험적 검증 성공률: 벤치마크에서의 계산적 성능 우위가 실험적 성공률의 향상을 보장하지는 않는다. RFdiffusion3의 상위 순위 설계 중 합성 시 올바르게 접히고 의도한 대로 기능하는 비율은 얼마이며, 이는 RFdiffusion2와 어떻게 비교되는가?

동역학과 유연성: 전체 원자 설계는 정적 구조를 생성하지만, 단백질은 동적이다. RFdiffusion3의 전체 원자 정확도는 입체 구조 변화, 알로스테릭 조절(allosteric regulation), 또는 본질적으로 무질서한 영역에 의존하는 단백질 설계로까지 확장되는가?

소분자 일반화: 효소 스캐폴드 벤치마크는 촉매 부위의 기하학적 구조를 검증하지만, 치료용 단백질 설계에는 효소 기질이 아닌 약물 유사 소분자와의 상호작용 설계가 포함되는 경우가 많다. 이 모델이 화학적으로 다양한 표적에 얼마나 잘 일반화되는가?

접근성 및 컴퓨팅 요구 사항: RFdiffusion3는 어느 정도의 계산 비용으로 작동하며, 10배의 속도 향상이 대규모 GPU 클러스터 없이도 학술 연구실에서 전원자 설계를 수행할 수 있을 만큼 충분한가?

구조 예측을 구조 생성으로 전환하는 것은 이 분야가 분자 설계에 딥러닝을 활용하는 방식에서의 개념적 진보를 나타낸다. 예측 작업을 통해 학습된 물리적 원리를 재활용함으로써, RFdiffusion3는 단백질 설계 커뮤니티를 원자 수준의 설계 의도를 직접 표현하고 실현할 수 있는 워크플로우에 한층 가까이 이끈다.

References (1)

Institute for Protein Design, University of Washington. (2025). De novo design of all-atom biomolecular interactions with RFdiffusion3. bioRxiv.

DOI Scholar

RFdiffusion3: All-Atom Protein Design at 10x Speed

RFdiffusion3: All-Atom Protein Design at 10x Speed

From Prediction to Generation: Inverting AlphaFold3

Performance: 37 of 41 Benchmarks

The 10x Speed Improvement

All-Atom Design: Why It Matters

Open Questions

RFdiffusion3: 10배 빠른 속도의 전원자 단백질 설계

예측에서 생성으로: AlphaFold3의 역전

성능: 41개 벤치마크 중 37개

10배 속도 향상

전체 원자 설계: 왜 중요한가

미해결 질문

References (1)

Explore this topic deeper