Trend AnalysisBiology & Life Sciences
ML-Guided Enzyme Engineering: Designing Industrial Biocatalysts with Artificial Intelligence
Enzymes catalyze reactions with exquisite selectivity under mild conditionsโbut natural enzymes rarely perform well in industrial settings (high temperatures, organic solvents, extreme pH). **Directed...
By Sean K.S. Shin
This blog summarizes research trends based on published paper abstracts. Specific numbers or findings may contain inaccuracies. For scholarly rigor, always consult the original papers cited in each post.
Why It Matters
Enzymes catalyze reactions with exquisite selectivity under mild conditionsโbut natural enzymes rarely perform well in industrial settings (high temperatures, organic solvents, extreme pH). Directed evolution (Nobel Prize 2018, Frances Arnold) revolutionized enzyme optimization, but exploring the vast sequence space (20^N possibilities for N residues) remains impossibly slow through random mutagenesis. Machine learning is changing this calculus, navigating fitness landscapes intelligently to find optimal enzymes in a fraction of the experimental effort.
The Science
The Stability-Activity Trade-Off
A persistent challenge: mutations that increase thermostability often decrease catalytic activity, and vice versa. The fitness landscape contains narrow ridges where both properties improve simultaneouslyโML helps find these ridges.
2025 Breakthrough: iCASE Strategy
A Nature Communications study introduces iCASE (isothermal Compressibility-Assisted dynamic Squeezing index perturbation Engineering):
Physics-informed features: Molecular dynamics simulations extract compressibility and flexibility metrics for each residue
Hierarchical ML model: Neural networks identify positions where mutations improve stability without sacrificing activity
Result: Demonstrating simultaneous improvement of thermostability and catalytic efficiency in industrial enzymes โ breaking the stability-activity trade-off that has long challenged enzyme engineeringMODIFY: Fitness-Diversity Co-Optimization
A 2024 Nature Communications study presents MODIFY, an ML algorithm that designs combinatorial mutant libraries balancing:
- Fitness: Predicted activity/stability scores
- Diversity: Functional diversity within the library to explore multiple solutions
- Applied to cytochrome c engineering, reportedly achieving 5x improvement in previously uncharacterised functions
TeleProt: Blending Evolution and Experiment
A 2025 Cell Systems paper introduces TeleProt, which combines:
- Evolutionary signals: Protein language models (ESM-2) capture natural sequence constraints
- Experimental feedback: Active learning from high-throughput screening data
- Result: Finding significantly better top-performing enzymes than directed evolution alone, with higher hit rates for diverse, high-activity variants
The New Enzyme Engineering Workflow
<
| Step | Traditional | ML-Guided |
|---|
| Target identification | Literature + intuition | Computational fitness prediction |
| Library design | Random/saturation mutagenesis | Smart library (MODIFY, ProteusAI) |
| Screening | 10โดโ10โถ variants | 10ยฒโ10ยณ variants (ML-prioritized) |
| Iterations | 5โ10 rounds | 2โ3 rounds |
| Timeline | 1โ3 years | 3โ12 months |
| Success rate | 1โ5% hit rate | 20โ50% hit rate |
Industrial Applications
- Plastic degradation: Engineered PETases with 100x improved thermostability for PET recycling at industrial temperatures
- Pharmaceutical synthesis: Enantioselective enzymes replacing heavy metal catalysts in API manufacturing
- Textile processing: Thermostable cellulases and laccases for eco-friendly fabric treatment
- Food industry: Lipases and proteases optimized for specific temperature/pH profiles
What To Watch
The integration of AlphaFold-predicted structures with ML-guided engineering is enabling rational design even for enzymes without crystal structures. Foundation models for protein function prediction (analogous to GPT for text) are emerging, promising few-shot enzyme optimization. Expect ML-designed enzymes to dominate new industrial biocatalysis applications by 2028.
๋ฉด์ฑ
์กฐํญ: ์ด ๊ฒ์๋ฌผ์ ์ ๋ณด ์ ๊ณต์ ๋ชฉ์ ์ผ๋ก ํ ์ฐ๊ตฌ ๋ํฅ ๊ฐ์์ด๋ค. ํ์ ์ฐ๊ตฌ์์ ์ธ์ฉํ๊ธฐ ์ ์ ๊ตฌ์ฒด์ ์ธ ์ฐ๊ตฌ ๊ฒฐ๊ณผ, ํต๊ณ ๋ฐ ์ฃผ์ฅ์ ์๋ณธ ๋
ผ๋ฌธ์ ํตํด ๋ฐ๋์ ๊ฒ์ฆํด์ผ ํ๋ค.
์ค์์ฑ
ํจ์๋ ์จํํ ์กฐ๊ฑด์์ ํ์ํ ์ ํ์ฑ์ผ๋ก ๋ฐ์์ ์ด๋งคํ์ง๋ง, ์์ฐ ํจ์๋ ์ฐ์
ํ๊ฒฝ(๊ณ ์จ, ์ ๊ธฐ ์ฉ๋งค, ๊ทน๋จ์ pH)์์ ์ข์ ์ฑ๋ฅ์ ๋ฐํํ๋ ๊ฒฝ์ฐ๊ฐ ๋๋ฌผ๋ค. ์งํฅ ์งํ(directed evolution)(2018๋
๋
ธ๋ฒจ์, Frances Arnold)๋ ํจ์ ์ต์ ํ์ ํ์ ์ ๊ฐ์ ธ์์ง๋ง, ๋ฐฉ๋ํ ์์ด ๊ณต๊ฐ(N๊ฐ ์๊ธฐ์ ๋ํด 20^N๊ฐ์ง ๊ฐ๋ฅ์ฑ)์ ๋ฌด์์ ๋์ฐ๋ณ์ด ์ ๋ฐ๋ก ํ์ํ๋ ๊ฒ์ ์ฌ์ ํ ๋ถ๊ฐ๋ฅํ ๋งํผ ๋๋ฆฌ๋ค. ๋จธ์ ๋ฌ๋(machine learning)์ ์ด๋ฌํ ๊ณ์ฐ ๋ฐฉ์์ ๋ณํ์์ผ, ์ ํฉ๋ ์งํ(fitness landscape)์ ์ง๋ฅ์ ์ผ๋ก ํ์ํจ์ผ๋ก์จ ์คํ์ ๋
ธ๋ ฅ์ ์ผ๋ถ๋ง์ผ๋ก ์ต์ ์ ํจ์๋ฅผ ์ฐพ์๋ธ๋ค.
๊ณผํ์ ๋ฐฐ๊ฒฝ
์์ ์ฑ-ํ์ฑ ํธ๋ ์ด๋์คํ
์ง์์ ์ธ ๊ณผ์ : ์ด์์ ์ฑ์ ๋์ด๋ ๋์ฐ๋ณ์ด๋ ์ด๋งค ํ์ฑ์ ์ ํ์ํค๋ ๊ฒฝ์ฐ๊ฐ ๋ง๊ณ , ๊ทธ ๋ฐ๋๋ ๋ง์ฐฌ๊ฐ์ง์ด๋ค. ์ ํฉ๋ ์งํ์๋ ๋ ํน์ฑ์ด ๋์์ ํฅ์๋๋ ์ข์ ๋ฅ์ ์ด ์กด์ฌํ๋ฉฐ, ML์ ์ด๋ฌํ ๋ฅ์ ์ ์ฐพ๋ ๋ฐ ๋์์ ์ค๋ค.
2025๋
ํ์ : iCASE ์ ๋ต
Nature Communications์ ๋ฐํ๋ ์ฐ๊ตฌ๋ iCASE(๋ฑ์จ ์์ถ๋ฅ ๋ณด์กฐ ๋์ ์คํด์ง ์ง์ ์ญ๋ ๊ณตํ, isothermal Compressibility-Assisted dynamic Squeezing index perturbation Engineering)๋ฅผ ์๊ฐํ๋ค:
๋ฌผ๋ฆฌ ๊ธฐ๋ฐ ํน์ง: ๋ถ์ ๋์ญํ ์๋ฎฌ๋ ์ด์
์ ํตํด ๊ฐ ์๊ธฐ์ ์์ถ๋ฅ ๋ฐ ์ ์ฐ์ฑ ์งํ๋ฅผ ์ถ์ถํ๋ค
๊ณ์ธต์ ML ๋ชจ๋ธ: ์ ๊ฒฝ๋ง์ด ํ์ฑ ์์ค ์์ด ์์ ์ฑ์ ํฅ์์ํค๋ ๋์ฐ๋ณ์ด ์์น๋ฅผ ์๋ณํ๋ค
๊ฒฐ๊ณผ: ์ฐ์
์ฉ ํจ์์์ ์ด์์ ์ฑ๊ณผ ์ด๋งค ํจ์จ์ ๋์ ํฅ์์ ์
์ฆํ์ฌ, ํจ์ ๊ณตํ ๋ถ์ผ์์ ์ค๋ซ๋์ ๋์ ๋ก ์ฌ๊ฒจ์ ธ ์จ ์์ ์ฑ-ํ์ฑ ํธ๋ ์ด๋์คํ๋ฅผ ๊ทน๋ณตํ๋คMODIFY: ์ ํฉ๋-๋ค์์ฑ ๊ณต๋ ์ต์ ํ
2024๋
Nature Communications ์ฐ๊ตฌ๋ ๋ค์์ ๊ท ํ์ ๋ง์ถ๋ ์กฐํฉ ๋์ฐ๋ณ์ด ๋ผ์ด๋ธ๋ฌ๋ฆฌ๋ฅผ ์ค๊ณํ๋ ML ์๊ณ ๋ฆฌ์ฆ MODIFY๋ฅผ ์ ์ํ๋ค:
- ์ ํฉ๋: ์์ธก๋ ํ์ฑ/์์ ์ฑ ์ ์
- ๋ค์์ฑ: ๋ณต์์ ํด๋ฅผ ํ์ํ๊ธฐ ์ํ ๋ผ์ด๋ธ๋ฌ๋ฆฌ ๋ด ๊ธฐ๋ฅ์ ๋ค์์ฑ
- ์ฌ์ดํ ํฌ๋กฌ c(cytochrome c) ๊ณตํ์ ์ ์ฉํ์ฌ, ๊ธฐ์กด์ ํน์ฑ์ด ๊ท๋ช
๋์ง ์์ ๊ธฐ๋ฅ์์ 5๋ฐฐ์ ํฅ์์ ๋ฌ์ฑํ ๊ฒ์ผ๋ก ๋ณด๊ณ ๋๋ค
TeleProt: ์งํ์ ์คํ์ ์ตํฉ
2025๋
Cell Systems ๋
ผ๋ฌธ์ TeleProt์ ์๊ฐํ๋ฉฐ, ์ด๋ ๋ค์์ ๊ฒฐํฉํ๋ค:
- ์งํ์ ์ ํธ: ๋จ๋ฐฑ์ง ์ธ์ด ๋ชจ๋ธ(protein language model)(ESM-2)์ด ์์ฐ ์์ด ์ ์ฝ์ ํฌ์ฐฉํ๋ค
- ์คํ์ ํผ๋๋ฐฑ: ๊ณ ์ฒ๋ฆฌ๋ ์คํฌ๋ฆฌ๋ ๋ฐ์ดํฐ๋ฅผ ํ์ฉํ ๋ฅ๋ ํ์ต(active learning)
- ๊ฒฐ๊ณผ: ์งํฅ ์งํ ๋จ๋
๋ฐฉ์ ๋๋น ํจ์ฌ ์ฐ์ํ ์ต๊ณ ์ฑ๋ฅ ํจ์๋ฅผ ๋ฐ๊ตดํ๋ฉฐ, ๋ค์ํ๊ณ ๋์ ํ์ฑ์ ์ง๋ ๋ณ์ด์ฒด์ ์ ์ค๋ฅ (hit rate)์ด ํฅ์๋๋ค
์๋ก์ด ํจ์ ๊ณตํ ์ํฌํ๋ก์ฐ
<
| ๋จ๊ณ | ์ ํต์ ๋ฐฉ๋ฒ | ML ๊ธฐ๋ฐ ๋ฐฉ๋ฒ |
|---|
| ํ์ ์ ์ | ๋ฌธํ + ์ง๊ด | ๊ณ์ฐ์ ์ ํฉ๋ ์์ธก |
| ๋ผ์ด๋ธ๋ฌ๋ฆฌ ์ค๊ณ | ๋ฌด์์/ํฌํ ๋์ฐ๋ณ์ด ์ ๋ฐ | ์ค๋งํธ ๋ผ์ด๋ธ๋ฌ๋ฆฌ(MODIFY, ProteusAI) |
| ์คํฌ๋ฆฌ๋ | 10โดโ10โถ๊ฐ ๋ณ์ด์ฒด | 10ยฒโ10ยณ๊ฐ ๋ณ์ด์ฒด(ML ์ฐ์ ์์ํ) |
| ๋ฐ๋ณต ํ์ | 5โ10๋ผ์ด๋ | 2โ3๋ผ์ด๋ |
| ์์ ๊ธฐ๊ฐ | 1โ3๋
| 3โ12๊ฐ์ |
| ์ฑ๊ณต๋ฅ | ์ ์ค๋ฅ 1โ5% | ์ ์ค๋ฅ 20โ50% |
์ฐ์
์ ์์ฉ
- ํ๋ผ์คํฑ ๋ถํด: ์ฐ์
์จ๋์์ PET ์ฌํ์ฉ์ ์ํด ์ด์์ ์ฑ์ด 100๋ฐฐ ํฅ์๋ PETase ๊ณตํ
- ์ ์ฝ ํฉ์ฑ: ์๋ฃ์์ฝํ(API) ์ ์กฐ์์ ์ค๊ธ์ ์ด๋งค๋ฅผ ๋์ฒดํ๋ ๊ฑฐ์ธ์ ์ ํ์ฑ ํจ์
- ์ฌ์ ๊ฐ๊ณต: ์นํ๊ฒฝ ์ง๋ฌผ ์ฒ๋ฆฌ๋ฅผ ์ํ ์ด์์ ์ฑ ์
๋ฃฐ๋ผ์ (cellulase) ๋ฐ ๋ผ์นด์์ (laccase)
- ์ํ ์ฐ์
: ํน์ ์จ๋/pH ํ๋กํ์ผ์ ์ต์ ํ๋ ๋ฆฌํ์์ (lipase) ๋ฐ ํ๋กํ
์์ (protease)
์ฃผ๋ชฉํ ์ฌํญ
AlphaFold ์์ธก ๊ตฌ์กฐ์ ML ๊ธฐ๋ฐ ๊ณตํ์ ํตํฉ์ ๊ฒฐ์ ๊ตฌ์กฐ๊ฐ ์๋ ํจ์์ ๋ํด์๋ ํฉ๋ฆฌ์ ์ค๊ณ๋ฅผ ๊ฐ๋ฅํ๊ฒ ํ๊ณ ์๋ค. ํ
์คํธ์ ๋ํ GPT์ ๋น๊ฒฌ๋๋ ๋จ๋ฐฑ์ง ๊ธฐ๋ฅ ์์ธก์ ์ํ ํ์ด๋ฐ์ด์
๋ชจ๋ธ(foundation model)์ด ๋ฑ์ฅํ๊ณ ์์ผ๋ฉฐ, ํจ์ท(few-shot) ํจ์ ์ต์ ํ๋ฅผ ๊ฐ๋ฅํ๊ฒ ํ ๊ฒ์ผ๋ก ๊ธฐ๋๋๋ค. 2028๋
๊น์ง ML ์ค๊ณ ํจ์๊ฐ ์๋ก์ด ์ฐ์
์ฉ ์์ฒด์ด๋งค ์์ฉ ๋ถ์ผ๋ฅผ ์ฃผ๋ํ ๊ฒ์ผ๋ก ์ ๋ง๋๋ค.
References (3)
Zheng, N., Cai, Y., Zhang, Z., Zhou, H., Deng, Y., Du, S., et al. (2025). Tailoring industrial enzymes for thermostability and activity evolution by the machine learning-based iCASE strategy. Nature Communications, 16(1).
Ding, K., Chin, M., Zhao, Y., Huang, W., Mai, B. K., Wang, H., et al. (2024). Machine learning-guided co-optimization of fitness and diversity facilitates combinatorial library design in enzyme engineering. Nature Communications, 15(1).
Thomas, N., Belanger, D., Xu, C., Lee, H., Hirano, K., Iwai, K., et al. (2025). Engineering highly active nuclease enzymes with machine learning and high-throughput screening. Cell Systems, 16(3), 101236.