Trend AnalysisChemistry & Materials

AI-Driven Drug Discovery: Diffusion Models Learn to Design Molecules

By Sean K.S. Shin

This blog summarizes research trends based on published paper abstracts. Specific numbers or findings may contain inaccuracies. For scholarly rigor, always consult the original papers cited in each post.

The Question

Traditional drug discovery takes 10–15 years and costs an estimated $2–3 billion per approved drug (including the cost of failures), with a >90% failure rate. Computational approaches promise to compress this timeline by designing molecules in silico before synthesis. The latest revolution: diffusion models — the same generative AI architecture behind DALL-E and Stable Diffusion — applied to 3D molecular structure generation. Can these models generate drug-like molecules that bind target proteins with high affinity, or do they produce chemically plausible but biologically useless structures?

Landscape

Huang et al. (2024) in Nature Communications, introduced a dual diffusion model that generates 3D molecules directly within target protein binding pockets. Unlike earlier generative models that produced 2D molecular graphs (SMILES strings), this approach operates in 3D coordinate space, placing atoms where the pocket geometry and electrostatics favour binding. The model simultaneously generates molecular structure and optimises binding pose — two tasks previously requiring separate computational steps.

Yim et al. (2024) reviewed diffusion models across structural biology (protein structure prediction, molecular docking, and molecule generation), providing a unified mathematical framework. They identified that diffusion models' key advantage is their ability to generate diverse, high-quality samples from complex distributions — critical for exploring chemical space, where the number of drug-like molecules is estimated to exceed 10⁶⁰ by some widely cited estimates.

S. Chen et al. (2025), published in Nature Machine Intelligence, applied deep lead optimisation directly within protein pockets, designing potent and selective ligands for the LTK protein — demonstrating clinical-stage applicability. Vost et al. (2025) analysed how different ways of incorporating protein structural information into generative models affect molecule quality.

Key Claims & Evidence

Claim	Evidence	Verdict
Diffusion models generate 3D molecules with high binding affinity	Dual diffusion model produces molecules with docking scores comparable to known drugs (Huang et al. 2024)	Supported computationally; experimental validation limited
Structure-based generation outperforms ligand-based approaches	Pocket-aware models generate more diverse and target-relevant molecules (Vost et al. 2025)	Supported across benchmarks
AI-designed molecules achieve potent target inhibition	LTK ligands from deep optimisation show nanomolar activity (Chen et al. 2025)	Demonstrated experimentally; a strong validation
Diffusion models can handle multiple structural biology tasks	Unified framework for protein design, docking, and molecule generation (Yim et al. 2024)	Supported; suggests convergent computational paradigm

Open Questions

Synthesisability: Can generated molecules actually be synthesised? Many AI-designed structures contain exotic functional groups or strained ring systems that are computationally plausible but synthetically impractical.

ADMET prediction: Binding affinity is necessary but insufficient. Can generative models jointly optimise for absorption, distribution, metabolism, excretion, and toxicity?

Prospective validation: Most benchmarks evaluate generated molecules retrospectively. How many AI-designed molecules have entered clinical trials, and what are their success rates?

Intellectual property: If an AI generates a novel molecular structure, who holds the patent? Current patent law requires human inventorship.

Referenced Papers

[1] Huang, L. et al. (2024). A dual diffusion model enables 3D molecule generation and lead optimization. Nature Communications, 15, 3170. DOI: 10.1038/s41467-024-46569-1
[2] Yim, J. et al. (2024). Diffusion models in protein structure and docking. WIREs Computational Molecular Science. DOI: 10.1002/wcms.1711
[3] Chen, S. et al. (2025). Deep lead optimization for potent LTK ligand design. Nature Machine Intelligence. DOI: 10.1038/s42256-025-00997-w
[4] Vost, L. et al. (2025). Incorporating targeted protein structure in deep learning for molecule generation. Chemical Science. DOI: 10.1039/d5sc05748e
[5] Tagliazucchi, L. & Costi, M.P. (2025). Mass Spectrometry Proteomics for Drug Discovery. J. Med. Chem. DOI: 10.1021/acs.jmedchem.5c01986

면책 조항: 이 게시물은 정보 제공을 목적으로 한 연구 동향 개요이다. 특정 연구 결과, 통계 및 주장은 학술 저작물에서 인용하기 전에 원본 논문을 통해 검증해야 한다.

AI 기반 신약 발견: 확산 모델이 분자 설계를 학습하다

분야: 화학 · 계산과학 | 방법론: 계산적

저자: Sean K.S. Shin | 날짜: 2026-03-17

연구 질문

전통적인 신약 발견은 10–15년이 소요되며, 승인된 약물 1개당 약 20–30억 달러(실패 비용 포함)가 드는 것으로 추산되고, 90% 이상의 실패율을 보인다. 계산적 접근법은 합성 이전에 컴퓨터상에서 분자를 설계함으로써 이 일정을 단축할 것을 약속한다. 최신의 혁신은 DALL-E와 Stable Diffusion의 기반이 되는 것과 동일한 생성형 AI 아키텍처인 확산 모델(diffusion model)을 3D 분자 구조 생성에 적용하는 것이다. 이러한 모델은 표적 단백질에 높은 친화도로 결합하는 의약품 유사 분자를 생성할 수 있는가, 아니면 화학적으로는 그럴듯하지만 생물학적으로는 무용한 구조를 만들어내는가?

연구 현황

Huang et al. (2024)은 Nature Communications에서 표적 단백질 결합 포켓 내부에 직접 3D 분자를 생성하는 이중 확산 모델을 소개하였다. 2D 분자 그래프(SMILES 문자열)를 생성했던 초기 생성 모델과 달리, 이 접근법은 3D 좌표 공간에서 작동하며, 포켓의 기하학적 구조와 정전기적 특성이 결합에 유리한 위치에 원자를 배치한다. 이 모델은 분자 구조 생성과 결합 포즈 최적화를 동시에 수행하는데, 이 두 작업은 기존에는 별도의 계산 단계를 필요로 하였다.

Yim et al. (2024)은 단백질 구조 예측, 분자 도킹, 분자 생성을 포함한 구조생물학 전반에 걸쳐 확산 모델을 검토하고 통합된 수학적 프레임워크를 제시하였다. 이들은 확산 모델의 핵심 장점이 복잡한 분포에서 다양하고 고품질의 샘플을 생성하는 능력에 있다고 확인하였으며, 이는 의약품 유사 분자의 수가 10⁶⁰을 초과하는 것으로 추산되는 화학 공간을 탐색하는 데 매우 중요하다.

S. Chen et al. (2025)은 Nature Machine Intelligence에 발표한 연구에서 단백질 포켓 내부에서 직접 심층 리드 최적화(deep lead optimisation)를 적용하여 LTK 단백질에 대한 강력하고 선택적인 리간드를 설계하였으며, 이를 통해 임상 단계에서의 적용 가능성을 입증하였다. Vost et al. (2025)은 단백질 구조 정보를 생성 모델에 통합하는 다양한 방식이 분자 품질에 미치는 영향을 분석하였다.

핵심 주장 및 근거

주장	근거	판정
확산 모델은 높은 결합 친화도를 가진 3D 분자를 생성한다	이중 확산 모델이 알려진 약물과 비교 가능한 도킹 점수를 가진 분자를 생성한다 (Huang et al. 2024)	계산적으로 지지됨; 실험적 검증은 제한적
구조 기반 생성 방식이 리간드 기반 접근법보다 우수하다	포켓 인식 모델이 더 다양하고 표적 관련성이 높은 분자를 생성한다 (Vost et al. 2025)	벤치마크 전반에 걸쳐 지지됨
AI가 설계한 분자가 강력한 표적 억제를 달성한다	심층 최적화로 생성된 LTK 리간드가 나노몰 농도 수준의 활성을 보인다 (Chen et al. 2025)	실험적으로 입증됨; 강력한 검증 사례
확산 모델이 다수의 구조생물학 과제를 처리할 수 있다	단백질 설계, 도킹, 분자 생성을 위한 통합 프레임워크 (Yim et al. 2024)	지지됨; 수렴적 계산 패러다임을 시사함

미해결 질문

합성 가능성: 생성된 분자를 실제로 합성할 수 있는가? AI가 설계한 구조 중 상당수는 계산적으로는 그럴듯하지만 합성적으로는 비실용적인, 특이한 작용기나 변형된 고리 계를 포함한다.

ADMET 예측: 결합 친화도는 필요 조건이지만 충분 조건은 아니다. 생성 모델은 흡수, 분포, 대사, 배설 및 독성(ADMET)에 대해 동시에 최적화할 수 있는가?

전향적 검증: 대부분의 벤치마크는 생성된 분자를 후향적으로 평가한다. AI가 설계한 분자 중 임상시험에 진입한 것은 얼마나 되며, 그 성공률은 어떠한가?

지적 재산권: AI가 새로운 분자 구조를 생성한 경우, 특허권은 누가 보유하는가? 현행 특허법은 인간 발명자를 요구한다.

References (5)

Huang, L., Xu, T., Yu, Y., Zhao, P., Chen, X., Han, J., et al. (2024). A dual diffusion model enables 3D molecule generation and lead optimization based on target pockets. Nature Communications, 15(1).

DOI Scholar

Yim, J., Stärk, H., Corso, G., Jing, B., Barzilay, R., & Jaakkola, T. S. (2024). Diffusion models in protein structure and docking. WIREs Computational Molecular Science, 14(2).

DOI Scholar

Chen, S., Zhang, O., Jiang, C., Zhao, H., Zhang, X., Chen, M., et al. (2025). Deep lead optimization enveloped in protein pocket and its application in designing potent and selective ligands targeting LTK protein. Nature Machine Intelligence, 7(3), 448-458.

DOI Scholar

Vost, L., Ziv, Y., & Deane, C. M. (2025). Incorporating targeted protein structure in deep learning methods for molecule generation in computational drug design. Chemical Science, 16(44), 20677-20693.

DOI Scholar

Tagliazucchi, L., & Costi, M. P. (2025). Mass Spectrometry Proteomics: A Key to Faster Drug Discovery. Journal of Medicinal Chemistry.

DOI Scholar

AI-Driven Drug Discovery: Diffusion Models Learn to Design Molecules

The Question

Landscape

Key Claims & Evidence

Open Questions

Referenced Papers

AI 기반 신약 발견: 확산 모델이 분자 설계를 학습하다

연구 질문

연구 현황

핵심 주장 및 근거

미해결 질문

References (5)

Explore this topic deeper