Trend AnalysisMathematics & StatisticsMachine/Deep Learning

Beyond Persistent Homology: Topological Data Analysis Enters the Deep Learning Era

Persistent homology has been TDA's workhorse for a decade—extracting topological features (loops, voids, connected components) from data. But 2025's research frontier moves beyond: topological deep learning, Euler characteristic methods, and Reeb graphs are enabling shape-aware AI for molecules, cells, and complex networks.

By Sean K.S. Shin

This blog summarizes research trends based on published paper abstracts. Specific numbers or findings may contain inaccuracies. For scholarly rigor, always consult the original papers cited in each post.

Topological data analysis starts from a compelling premise: the shape of data carries information that standard statistical methods miss. Two datasets may have identical means, variances, and correlations but fundamentally different topological structure—one might form a single connected cluster while the other forms two separate loops with a void between them. TDA provides mathematical tools to detect, quantify, and compare these structural features.

Persistent homology—the technique that tracks topological features (connected components, loops, voids) as they appear and disappear across scales—has been TDA's primary tool for over a decade. It produces "persistence diagrams" that summarize a dataset's multi-scale topological structure in a compact, interpretable representation.

But persistent homology has limitations. It captures only certain topological invariants (Betti numbers); it struggles with noisy data in high dimensions; and its output—persistence diagrams—does not naturally integrate with deep learning architectures that expect vector inputs. The 2025 research frontier, comprehensively reviewed by Su et al., extends TDA beyond these limitations through topological deep learning (TDL)—architectures that natively process topological structures.

From Persistence Diagrams to Topological Neural Networks

Su et al.'s review identifies three waves of TDA development:

Wave 1: Classical TDA (2000s-2010s). Persistent homology is applied to static datasets, producing persistence diagrams that are analyzed using statistical methods. Applications include shape recognition, sensor network coverage, and protein structure analysis.

Wave 2: Vectorization (2010s-2020s). Persistence diagrams are converted to vector representations (persistence landscapes, persistence images, Betti curves) that can be used as features in standard ML models. This bridges TDA and ML but treats topology as a preprocessing step, not an integrated component of learning.

Wave 3: Topological Deep Learning (2020s-present). Neural network architectures are designed to operate directly on topological structures—simplicial complexes, cell complexes, hypergraphs. These architectures learn representations that are inherently topological, rather than converting topology to vectors as an afterthought.

The key architectures in wave 3 include:

Simplicial neural networks: Message passing on simplicial complexes (triangles, tetrahedra, higher simplices) rather than just edges
Cell complex neural networks: Generalizing simplicial networks to arbitrary cell structures
Sheaf neural networks: Neural networks that process data defined over topological sheaves—a highly general framework that subsumes graph neural networks as a special case

Molecular TDA: Chemistry Through a Topological Lens

Wee & Jiang (published in the Journal of Chemical Information and Modeling) provide the most comprehensive review of TDA applications in molecular science. Molecules have rich topological structure: covalent bonds form graphs, protein surfaces form 2-manifolds, binding pockets form cavities (voids in the topological sense), and molecular interactions form higher-order complexes.

The review identifies several areas where TDA provides advantages over traditional molecular descriptors:

Protein-ligand binding: Persistent homology captures the shape of the binding pocket more faithfully than geometric descriptors (volume, surface area), because topological features are invariant to continuous deformations
Molecular generation: Topological constraints (ring count, connectivity) provide useful inductive biases for generative models that produce valid molecular structures
Drug toxicity prediction: Topological features of molecular interaction networks correlate with toxicity in ways that individual molecular properties do not capture

Single-Cell Biology: Topology of Cell Populations

Hernández-Lemus applies TDA to single-cell biology—a domain where the "shape" of data has direct biological meaning. Single-cell RNA sequencing measures gene expression in individual cells, producing datasets with thousands of dimensions (one per gene) and thousands to millions of data points (one per cell).

The topological structure of these datasets reflects biological processes:

Branches in the data manifold correspond to cell differentiation trajectories
Loops correspond to cell cycle dynamics
Disconnected components correspond to distinct cell types

Persistent homology detects these features without requiring prior knowledge of the cell types or differentiation programs—a data-driven approach to biological discovery that complements the hypothesis-driven tradition of molecular biology.

Claims and Evidence

Claim	Evidence	Verdict
TDA captures structural information that standard statistics miss	Well-established in the mathematical community with numerous examples	✅ Well-established
Topological deep learning outperforms vectorized TDA approaches	Su et al. review evidence of improvement on specific benchmarks	✅ Supported (task-dependent)
TDA provides useful features for molecular property prediction	Wee & Jiang review extensive evidence across molecular tasks	✅ Supported
TDA reveals biological structure in single-cell data	Hernández-Lemus demonstrates topological recovery of known differentiation trajectories	✅ Supported
TDA is computationally scalable to large datasets	Computational cost of persistent homology limits scalability; approximations help	⚠️ Improving but still a constraint

Open Questions

Computational scalability: Persistent homology has worst-case cubic complexity in the number of simplices. For large datasets (millions of points in high dimensions), this is prohibitive. Can approximate TDA methods maintain topological fidelity while scaling to massive data?

Statistical inference on topological features: When persistent homology detects a feature (a loop, a void), is it statistically significant or an artifact of sampling noise? The emerging field of statistical TDA provides hypothesis tests and confidence sets for topological features, but the methods are not yet widely adopted.

Integration with foundation models: Can topological features be integrated into LLM-based scientific reasoning? For instance, could an LLM that understands molecular topology reason about drug-target interactions using topological representations?

Higher-order interactions: Standard graph neural networks model pairwise interactions. Topological deep learning models higher-order interactions (simplicial, cellular). For which applications do higher-order interactions provide meaningful improvement?

Interpretability: Topological features (persistent homology classes, Betti numbers) have clear mathematical definitions but may lack intuitive biological or physical interpretation. How do we bridge the mathematical rigor of TDA with domain-specific interpretability?

What This Means for Your Research

For data scientists, TDA provides a complementary lens to standard methods—one that captures structural properties (connectivity, loops, voids) that correlations and distributions miss. The investment in learning TDA is repaid in domains where data has genuine topological structure: molecular science, biology, neuroscience, and complex networks.

For mathematicians, the TDA-to-TDL pipeline provides a compelling application trajectory for algebraic topology—from abstract mathematical tools to deployed machine learning architectures. The theoretical questions raised by topological deep learning (expressivity of simplicial networks, stability of topological features under noise) are genuine mathematical research problems.

For domain scientists in chemistry and biology, the message is that TDA is no longer a curiosity—it is an established tool with demonstrated value for molecular property prediction, single-cell analysis, and biological network modeling. The 2025 tools are more accessible than ever, with software packages (Ripser, GUDHI, giotto-tda) that make persistent homology computation routine.

면책 조항: 이 게시물은 정보 제공을 목적으로 한 연구 동향 개요이다. 학술 연구에서 인용하기 전에 특정 발견, 통계 및 주장을 원본 논문과 대조하여 검증해야 한다.

지속적 호몰로지를 넘어서: 위상적 데이터 분석, 딥러닝 시대로 진입하다

위상적 데이터 분석(Topological Data Analysis, TDA)은 설득력 있는 전제에서 출발한다: 데이터의 형태(shape)는 표준 통계적 방법이 놓치는 정보를 담고 있다는 것이다. 두 데이터셋이 동일한 평균, 분산, 상관관계를 가지더라도 근본적으로 다른 위상적 구조를 가질 수 있다—하나는 단일한 연결된 클러스터를 형성하는 반면, 다른 하나는 그 사이에 공동(void)을 가진 두 개의 분리된 루프를 형성할 수 있다. TDA는 이러한 구조적 특징을 감지하고, 정량화하며, 비교하는 수학적 도구를 제공한다.

지속적 호몰로지(Persistent homology)—여러 척도에 걸쳐 위상적 특징(연결 성분, 루프, 공동)이 나타나고 사라지는 과정을 추적하는 기법—은 10년 이상 TDA의 핵심 도구였다. 이 기법은 데이터셋의 다중 척도 위상적 구조를 간결하고 해석 가능한 표현으로 요약하는 "지속성 다이어그램(persistence diagrams)"을 생성한다.

그러나 지속적 호몰로지에는 한계가 있다. 특정 위상적 불변량(Betti 수)만을 포착하며, 고차원의 노이즈가 많은 데이터에서 어려움을 겪는다. 또한 그 결과물인 지속성 다이어그램은 벡터 입력을 기대하는 딥러닝 아키텍처와 자연스럽게 통합되지 않는다. Su et al.이 포괄적으로 검토한 2025년 연구의 최전선은 위상적 딥러닝(Topological Deep Learning, TDL)—위상적 구조를 기본적으로 처리하는 아키텍처—을 통해 TDA를 이러한 한계 너머로 확장한다.

지속성 다이어그램에서 위상적 신경망으로

Su et al.의 리뷰는 TDA 발전의 세 가지 흐름을 규명한다:

1단계: 고전적 TDA (2000년대~2010년대). 정적 데이터셋에 지속적 호몰로지를 적용하여 지속성 다이어그램을 생성하고, 이를 통계적 방법으로 분석한다. 적용 분야로는 형태 인식, 센서 네트워크 커버리지, 단백질 구조 분석 등이 있다.

2단계: 벡터화 (2010년대~2020년대). 지속성 다이어그램을 표준 ML 모델에서 특징(feature)으로 활용할 수 있는 벡터 표현(지속성 랜드스케이프, 지속성 이미지, Betti 곡선)으로 변환한다. 이는 TDA와 ML을 연결하지만, 위상을 학습의 통합된 구성 요소가 아닌 전처리 단계로 다룬다.

3단계: 위상적 딥러닝 (2020년대~현재). 신경망 아키텍처가 단체 복합체(simplicial complexes), 셀 복합체(cell complexes), 하이퍼그래프(hypergraphs)와 같은 위상적 구조를 직접 처리하도록 설계된다. 이러한 아키텍처는 위상을 사후적으로 벡터로 변환하는 방식이 아니라, 본질적으로 위상적인 표현을 학습한다.

3단계의 핵심 아키텍처는 다음과 같다:

단체 신경망(Simplicial neural networks): 단순히 엣지(edge)가 아닌 단체 복합체(삼각형, 사면체, 고차 단체) 위에서의 메시지 패싱(message passing)
셀 복합체 신경망(Cell complex neural networks): 단체 네트워크를 임의의 셀 구조로 일반화
쉬프 신경망(Sheaf neural networks): 위상적 쉬프(topological sheaves) 위에 정의된 데이터를 처리하는 신경망—그래프 신경망(graph neural networks)을 특수 사례로 포함하는 매우 범용적인 프레임워크

분자 TDA: 위상적 렌즈를 통한 화학

Wee & Jiang(Journal of Chemical Information and Modeling 게재)은 분자 과학에서의 TDA 응용에 관한 가장 포괄적인 리뷰를 제공한다. 분자는 풍부한 위상적 구조를 가진다: 공유 결합은 그래프를 형성하고, 단백질 표면은 2-다양체(2-manifold)를 형성하며, 결합 포켓(binding pockets)은 공동(위상적 의미에서의 void)을 형성하고, 분자 상호작용은 고차 복합체를 형성한다.

이 리뷰는 TDA가 전통적인 분자 기술자(descriptor)에 비해 이점을 제공하는 몇 가지 영역을 규명한다:

단백질-리간드 결합(Protein-ligand binding): 지속적 호몰로지는 결합 포켓의 형태를 기하학적 기술자(부피, 표면적)보다 더 충실하게 포착한다. 이는 위상적 특징이 연속적 변형에 대해 불변이기 때문이다.
분자 생성: 위상적 제약 조건(고리 수, 연결성)은 유효한 분자 구조를 생성하는 생성 모델에 유용한 귀납적 편향을 제공한다
약물 독성 예측: 분자 상호작용 네트워크의 위상적 특징은 개별 분자 특성으로는 포착되지 않는 방식으로 독성과 상관관계를 보인다

단일세포 생물학: 세포 집단의 위상

Hernández-Lemus는 TDA를 단일세포 생물학에 적용한다. 이 분야에서는 데이터의 "형태"가 직접적인 생물학적 의미를 갖는다. 단일세포 RNA 시퀀싱은 개별 세포에서 유전자 발현을 측정하며, 수천 개의 차원(유전자당 하나)과 수천에서 수백만 개의 데이터 포인트(세포당 하나)를 갖는 데이터셋을 생성한다.

이러한 데이터셋의 위상적 구조는 생물학적 과정을 반영한다:

데이터 다양체의 분기는 세포 분화 궤적에 대응된다
루프는 세포 주기 역학에 대응된다
분리된 성분은 별개의 세포 유형에 대응된다

지속적 호몰로지는 세포 유형이나 분화 프로그램에 대한 사전 지식 없이도 이러한 특징을 탐지한다. 이는 분자생물학의 가설 주도적 전통을 보완하는, 생물학적 발견을 위한 데이터 기반 접근법이다.

주장과 근거

주장	근거	판정
TDA는 표준 통계가 놓치는 구조적 정보를 포착한다	수학 커뮤니티에서 수많은 사례와 함께 잘 확립되어 있다	✅ 잘 확립됨
위상적 딥러닝은 벡터화된 TDA 접근법을 능가한다	Su et al.이 특정 벤치마크에서의 개선 근거를 검토한다	✅ 지지됨 (태스크 의존적)
TDA는 분자 특성 예측에 유용한 특징을 제공한다	Wee & Jiang이 분자 태스크 전반에 걸친 광범위한 근거를 검토한다	✅ 지지됨
TDA는 단일세포 데이터에서 생물학적 구조를 드러낸다	Hernández-Lemus가 알려진 분화 궤적의 위상적 복원을 입증한다	✅ 지지됨
TDA는 대규모 데이터셋으로 계산적으로 확장 가능하다	지속적 호몰로지의 계산 비용이 확장성을 제한하며, 근사법이 도움이 된다	⚠️ 개선 중이나 여전히 제약 조건임

미해결 문제

계산적 확장성: 지속적 호몰로지는 단체 수에 대해 최악의 경우 3차 복잡도를 갖는다. 대규모 데이터셋(고차원 공간의 수백만 개 포인트)에서 이는 현실적으로 적용하기 어렵다. 근사 TDA 방법은 위상적 충실도를 유지하면서 대규모 데이터로 확장될 수 있는가?

위상적 특징에 대한 통계적 추론: 지속적 호몰로지가 어떤 특징(루프, 공동)을 탐지했을 때, 그것이 통계적으로 유의미한 것인가 아니면 샘플링 잡음의 인위적 산물인가? 통계적 TDA라는 신흥 분야는 위상적 특징에 대한 가설 검정과 신뢰 집합을 제공하지만, 이 방법들은 아직 널리 채택되지 않았다.

기반 모델과의 통합: 위상적 특징을 LLM 기반 과학적 추론에 통합할 수 있는가? 예를 들어, 분자 위상을 이해하는 LLM이 위상적 표현을 활용하여 약물-표적 상호작용에 대해 추론할 수 있는가?

고차 상호작용: 표준 그래프 신경망은 쌍별 상호작용을 모델링한다. 위상적 딥러닝은 고차 상호작용(단체적, 세포적)을 모델링한다. 어떤 응용 분야에서 고차 상호작용이 의미 있는 개선을 제공하는가?

해석 가능성: 위상적 특징(지속적 호몰로지 클래스, Betti 수)은 명확한 수학적 정의를 갖지만, 직관적인 생물학적 또는 물리적 해석이 부족할 수 있다. TDA의 수학적 엄밀성과 분야별 해석 가능성 사이의 간극을 어떻게 좁힐 것인가?

연구에의 시사점

데이터 과학자에게 TDA는 표준 방법을 보완하는 관점을 제공한다. 이는 상관관계와 분포로는 놓칠 수 있는 구조적 특성(연결성, 루프, 공동)을 포착한다. TDA 학습에 투자한 비용은 데이터가 진정한 위상적 구조를 갖는 분야, 즉 분자과학, 생물학, 신경과학, 복잡계 네트워크에서 충분히 회수된다. 수학자들에게 있어, TDA에서 TDL로 이어지는 파이프라인은 대수적 위상수학의 매력적인 응용 궤적을 제시한다—추상적인 수학적 도구로부터 실제로 배포된 머신러닝 아키텍처에 이르기까지. 위상적 딥러닝(topological deep learning)이 제기하는 이론적 문제들(단순 네트워크의 표현력, 노이즈 하에서 위상적 특징의 안정성)은 진정한 수학 연구 문제들이다.

화학 및 생물학 분야의 도메인 과학자들에게 전하는 메시지는, TDA가 더 이상 단순한 호기심의 대상이 아니라는 것이다—TDA는 분자 특성 예측, 단일세포 분석, 생물학적 네트워크 모델링에서 그 가치가 입증된 확립된 도구이다. 2025년의 도구들은 그 어느 때보다 접근성이 높으며, 소프트웨어 패키지(Ripser, GUDHI, giotto-tda)를 통해 지속적 호몰로지(persistent homology) 계산이 일상적인 작업이 되었다.

References (3)

[1] Su, Z., Liu, X., Bou Hamdan, L. et al. (2025). Topological data analysis and topological deep learning beyond persistent homology: a review. Artificial Intelligence Review.

DOI Scholar

[2] Wee, J. & Jiang, J. (2025). A Review of Topological Data Analysis and Topological Deep Learning in Molecular Sciences. J. Chem. Inf. Model..

DOI Scholar

[3] Hernández-Lemus, E. (2025). Topological data analysis in single cell biology. Frontiers in Immunology.

DOI Scholar

Beyond Persistent Homology: Topological Data Analysis Enters the Deep Learning Era

From Persistence Diagrams to Topological Neural Networks

Molecular TDA: Chemistry Through a Topological Lens

Single-Cell Biology: Topology of Cell Populations

Claims and Evidence

Open Questions

What This Means for Your Research

지속적 호몰로지를 넘어서: 위상적 데이터 분석, 딥러닝 시대로 진입하다

지속성 다이어그램에서 위상적 신경망으로

분자 TDA: 위상적 렌즈를 통한 화학

단일세포 생물학: 세포 집단의 위상

주장과 근거

미해결 문제

연구에의 시사점

References (3)

Explore this topic deeper