Trend AnalysisMathematics & StatisticsOptimization & Operations Research

Optimal Transport on Curved Spaces: When Wasserstein Geometry Meets Neural Network Training

Optimal transport theory—measuring the most efficient way to move one probability distribution to another—has become a powerful tool in machine learning. The 2025-2026 frontier extends OT to curved Riemannian manifolds, enabling geometric neural network training and operator learning on complex domains.

By Sean K.S. Shin

This blog summarizes research trends based on published paper abstracts. Specific numbers or findings may contain inaccuracies. For scholarly rigor, always consult the original papers cited in each post.

Optimal transport (OT) theory asks a deceptively simple question: given two probability distributions, what is the most efficient way to transform one into the other? The "cost" of transformation—measured by the Wasserstein distance—provides a geometry on the space of probability distributions that has proven remarkably useful across machine learning, from generative modeling (Wasserstein GANs) to domain adaptation to single-cell biology.

But most OT applications assume that the underlying data lives in flat Euclidean space. Real data often does not. Probability distributions on spheres (directional data), hyperbolic spaces (hierarchical data), manifolds of positive definite matrices (covariance data), and more general Riemannian manifolds require OT theory that respects the geometry of the space on which distributions are defined.

The 2025-2026 research frontier extends OT to these curved settings, with implications that span from pure mathematics (the geometry of probability spaces) to applied machine learning (training neural networks on non-Euclidean domains).

Neural Optimal Transport on Manifolds

Micheli et al. (2026) present the most direct application: neural OT methods that compute transport maps between distributions on Riemannian manifolds. Standard neural OT methods—which use neural networks to learn transport maps from data—are tailored to Euclidean geometry. They parameterize transport maps as functions from ℝⁿ to ℝⁿ and optimize using Euclidean gradients.

On a Riemannian manifold, this approach fails: the transport map must respect the manifold's geometry—mapping points on the manifold to other points on the manifold while remaining consistent with the curved metric. Micheli et al. solve this by parameterizing transport maps as compositions of exponential maps (which map tangent vectors to manifold points) and geodesic interpolations (which define curves of minimal length on the manifold).

The practical significance: any application where data naturally lives on a manifold—molecular conformations on the space of rotation matrices, brain connectivity patterns on the space of positive definite matrices, wind directions on the sphere—can now benefit from neural OT methods that respect the data's intrinsic geometry.

Geometric-Entropic Optimization for Training

Ferrara (2026) integrates OT with Riemannian gradient methods for neural network training. The core insight: the loss landscape of a neural network can be understood as a problem in the geometry of probability distributions. The model's parameters define a probability distribution over outputs; training moves this distribution toward the target distribution of correct outputs. OT provides a natural measure of the distance between these distributions.

The geometric-entropic formulation adds an entropic regularization term to the OT distance—smoothing the optimization landscape and enabling efficient gradient computation. The Riemannian gradient methods then navigate this landscape along geodesics of the parameter manifold, respecting the natural geometry of the optimization problem.

This is more than a mathematical curiosity. Standard gradient descent treats all parameter directions equally, but the parameter space of a neural network has a natural Riemannian structure (the Fisher information metric) where some directions correspond to large changes in model behavior and others to negligible changes. Riemannian optimization that respects this structure can converge faster and to better optima than standard methods.

Operator Learning on Complex Geometries

Li et al. extend OT to operator learning—learning mappings between function spaces, as required for solving partial differential equations (PDEs) on complex geometries. Their approach generalizes discretized meshes (the standard representation for PDE domains) to mesh density functions—probability distributions over the domain that OT can transport between different discretizations.

This abstraction is practically valuable because it enables operator learning that is mesh-independent: the learned operator works for any mesh resolution or configuration, not just the specific mesh used during training. For engineering applications where PDE solutions must be computed on different meshes for different geometries, mesh-independent operators avoid the prohibitive cost of retraining for each geometry.

The Geometry of Probability Spaces

Gomes et al. provide foundational mathematics that underpins these applications: an intrinsic development of the Riemannian geometry of the Wasserstein space on the unit circle. Building on work by Otto, Lott, and Villani, they develop the geometric tools—curvature, geodesics, parallel transport—needed to do calculus on the space of probability distributions.

This is pure mathematics with applied consequences. The geometric properties of Wasserstein space determine the convergence behavior of optimization algorithms that operate on probability distributions. Understanding curvature of the probability space, for instance, predicts where gradient-based methods will converge quickly (positive curvature) and where they will struggle (negative curvature).

Claims and Evidence

Claim	Evidence	Verdict
Standard OT methods fail on non-Euclidean data	Euclidean parameterization violates manifold constraints	✅ Mathematical fact
Riemannian neural OT respects manifold geometry	Micheli et al. demonstrate geodesic-aware transport maps	✅ Supported
OT-based training improves neural network optimization	Ferrara shows convergence benefits of geometric-entropic formulation	✅ Supported (theoretical + experimental)
Mesh-independent operator learning is feasible via OT	Li et al. demonstrate across multiple PDE domains	✅ Supported
Wasserstein geometry on manifolds is fully understood	Active research area; many open questions remain	⚠️ Partially understood

Open Questions

Computational cost: Riemannian OT is more expensive than Euclidean OT. The exponential and logarithmic maps that replace simple addition and subtraction add computational overhead. Can we develop approximations that maintain geometric fidelity at lower cost?

High-dimensional manifolds: Current Riemannian OT methods work well for low-dimensional manifolds (spheres, rotation groups). How do they scale to high-dimensional manifolds (the space of neural network weights, the configuration space of large molecules)?

Discrete vs. continuous: Practical applications involve discrete samples from continuous distributions. The interplay between discrete OT (which is a linear program) and continuous OT (which is a PDE) creates approximation errors that are not fully characterized on manifolds.

Connections to information geometry: The Fisher information metric provides a natural Riemannian structure on statistical models. How does OT on this specific manifold relate to classical information geometry? Can the two frameworks be unified?

Applications to generative modeling: Wasserstein GANs use OT in Euclidean space. Can Riemannian neural OT improve generative modeling for manifold-valued data (molecular conformations, directional statistics, shape spaces)?

What This Means for Your Research

For applied mathematicians, the extension of OT to Riemannian manifolds opens a rich theory that combines differential geometry, probability theory, and optimization—fields that have traditionally developed in relative isolation. The problems are mathematically deep and practically relevant.

For machine learning researchers, Riemannian OT provides tools for domains where Euclidean assumptions are inappropriate—and many important domains are non-Euclidean: rotations in robotics, shapes in computer vision, covariance matrices in brain imaging, phylogenetic trees in biology.

For computational scientists, mesh-independent operator learning (Li et al.) addresses a practical bottleneck in scientific computing: the need to retrain models for each new mesh. OT-based abstraction enables learned solvers that generalize across geometries, potentially accelerating engineering design cycles.

The message across all these applications is consistent: geometry matters. When data, models, or optimization landscapes have non-trivial geometric structure, methods that respect that structure outperform those that ignore it. Optimal transport provides the mathematical language for expressing and exploiting geometric structure in probability and optimization.

면책 조항: 이 게시물은 정보 제공 목적의 연구 동향 개요이다. 학술 연구에서 인용하기 전에 구체적인 연구 결과, 통계 및 주장은 원본 논문과 대조하여 검증해야 한다.

곡선 공간에서의 최적 수송: Wasserstein 기하학과 신경망 훈련의 만남

최적 수송(OT) 이론은 겉으로 보기에 단순한 질문을 던진다: 두 확률 분포가 주어졌을 때, 하나를 다른 하나로 변환하는 가장 효율적인 방법은 무엇인가? 변환의 "비용"—Wasserstein 거리로 측정되는—은 확률 분포 공간에 기하학적 구조를 부여하며, 이는 생성 모델링(Wasserstein GAN)에서 도메인 적응, 단일 세포 생물학에 이르기까지 기계 학습 전반에 걸쳐 놀랍도록 유용한 것으로 입증되었다.

그러나 대부분의 OT 응용은 기반 데이터가 평탄한 유클리드 공간에 존재한다고 가정한다. 실제 데이터는 그렇지 않은 경우가 많다. 구면(방향 데이터), 쌍곡 공간(계층적 데이터), 양의 정부호 행렬의 다양체(공분산 데이터), 그리고 더 일반적인 Riemann 다양체 위의 확률 분포는 분포가 정의된 공간의 기하학을 존중하는 OT 이론을 필요로 한다.

2025-2026년의 연구 최전선은 OT를 이러한 곡선 환경으로 확장하며, 그 함의는 순수 수학(확률 공간의 기하학)에서 응용 기계 학습(비유클리드 영역에서의 신경망 훈련)에 이르기까지 광범위하다.

다양체 위에서의 신경 최적 수송

Micheli et al. (2026)은 가장 직접적인 응용을 제시한다: Riemann 다양체 위의 분포 간 수송 사상을 계산하는 신경 OT 방법. 데이터로부터 수송 사상을 학습하기 위해 신경망을 사용하는 표준 신경 OT 방법은 유클리드 기하학에 맞게 조정되어 있다. 이 방법들은 수송 사상을 ℝⁿ에서 ℝⁿ으로의 함수로 매개변수화하고 유클리드 기울기를 사용하여 최적화한다.

Riemann 다양체에서는 이러한 접근법이 실패한다: 수송 사상은 다양체의 기하학을 존중해야 하며—다양체 위의 점들을 다양체의 다른 점들로 사상하면서 곡선 계량(metric)과 일관성을 유지해야 한다. Micheli et al.은 수송 사상을 지수 사상(접선 벡터를 다양체의 점으로 사상하는)과 측지선 보간(다양체 위에서 최소 길이 곡선을 정의하는)의 합성으로 매개변수화함으로써 이 문제를 해결한다.

실용적 의의는 다음과 같다: 데이터가 자연스럽게 다양체 위에 존재하는 모든 응용—회전 행렬 공간 위의 분자 형태, 양의 정부호 행렬 공간 위의 뇌 연결 패턴, 구면 위의 풍향—이 이제 데이터의 내재적 기하학을 존중하는 신경 OT 방법의 혜택을 누릴 수 있다.

훈련을 위한 기하-엔트로피 최적화

Ferrara (2026)는 신경망 훈련을 위해 OT와 Riemann 기울기 방법을 통합한다. 핵심 통찰은 다음과 같다: 신경망의 손실 경관은 확률 분포의 기하학 문제로 이해될 수 있다. 모델의 매개변수는 출력에 대한 확률 분포를 정의하며, 훈련은 이 분포를 올바른 출력의 목표 분포 방향으로 이동시킨다. OT는 이러한 분포들 사이의 거리에 대한 자연스러운 측도를 제공한다.

기하-엔트로피 공식화는 OT 거리에 엔트로피 정규화 항을 추가하여—최적화 경관을 매끄럽게 하고 효율적인 기울기 계산을 가능하게 한다. 그런 다음 Riemann 기울기 방법은 최적화 문제의 자연스러운 기하학을 존중하면서 매개변수 다양체의 측지선을 따라 이 경관을 탐색한다.

이것은 단순한 수학적 호기심 이상이다. 표준 경사 하강법은 모든 매개변수 방향을 동등하게 취급하지만, 신경망의 매개변수 공간은 자연스러운 Riemann 구조(Fisher 정보 계량)를 가지고 있으며, 이 구조에서 일부 방향은 모델 동작의 큰 변화에 해당하고 다른 방향은 무시할 수 있는 변화에 해당한다. 이 구조를 존중하는 Riemann 최적화는 표준 방법보다 더 빠르게, 그리고 더 나은 최적해로 수렴할 수 있다.

복잡한 기하학에서의 연산자 학습

Li et al.은 OT를 연산자 학습(operator learning)—복잡한 기하학적 구조에서 편미분방정식(PDE)을 풀기 위해 필요한 함수 공간 간의 매핑을 학습하는 것—으로 확장한다. 이 접근법은 이산화된 메시(PDE 도메인의 표준 표현)를 메시 밀도 함수(mesh density functions)—OT가 서로 다른 이산화 간에 수송할 수 있는 도메인 위의 확률 분포—로 일반화한다.

이러한 추상화는 실용적으로 가치가 있는데, 메시 독립적(mesh-independent) 연산자 학습을 가능하게 하기 때문이다. 즉, 학습된 연산자는 훈련 중에 사용된 특정 메시뿐만 아니라 어떤 메시 해상도나 구성에 대해서도 작동한다. 서로 다른 기하학에 대해 서로 다른 메시에서 PDE 해를 계산해야 하는 공학 응용에서, 메시 독립적 연산자는 각 기하학마다 재훈련해야 하는 막대한 비용을 피할 수 있게 한다.

확률 공간의 기하학

Gomes et al.은 이러한 응용들을 뒷받침하는 기초 수학을 제공한다. 단위 원 위의 Wasserstein 공간에 대한 리만 기하학(Riemannian geometry)의 내재적 전개가 그것이다. Otto, Lott, Villani의 연구를 토대로, 이들은 확률 분포 공간에서 미적분학을 수행하는 데 필요한 기하학적 도구—곡률, 측지선(geodesics), 평행 수송(parallel transport)—를 개발한다.

이것은 응용적 결과를 수반하는 순수 수학이다. Wasserstein 공간의 기하학적 성질은 확률 분포에 대해 작동하는 최적화 알고리즘의 수렴 거동을 결정한다. 예를 들어, 확률 공간의 곡률을 이해하면 기울기 기반 방법이 빠르게 수렴하는 위치(양의 곡률)와 어려움을 겪는 위치(음의 곡률)를 예측할 수 있다.

주장과 근거

주장	근거	판정
표준 OT 방법은 비유클리드 데이터에서 실패한다	유클리드 매개변수화가 다양체 제약 조건을 위반한다	✅ 수학적 사실
리만 신경망 OT는 다양체 기하학을 존중한다	Micheli et al.이 측지선 인식 수송 맵을 시연한다	✅ 지지됨
OT 기반 훈련이 신경망 최적화를 개선한다	Ferrara가 기하학적-엔트로픽 공식화의 수렴 이점을 제시한다	✅ 지지됨 (이론적 + 실험적)
OT를 통한 메시 독립적 연산자 학습이 가능하다	Li et al.이 여러 PDE 도메인에 걸쳐 시연한다	✅ 지지됨
다양체 위의 Wasserstein 기하학이 완전히 이해되어 있다	활발한 연구 분야이며 많은 미해결 문제가 남아 있다	⚠️ 부분적으로 이해됨

미해결 문제

계산 비용: 리만 OT는 유클리드 OT보다 더 많은 비용이 든다. 단순한 덧셈과 뺄셈을 대체하는 지수 맵(exponential map)과 로그 맵(logarithmic map)은 추가적인 계산 부담을 야기한다. 낮은 비용으로 기하학적 충실도를 유지하는 근사법을 개발할 수 있을까?

고차원 다양체: 현재의 리만 OT 방법은 저차원 다양체(구, 회전군)에서 잘 작동한다. 고차원 다양체(신경망 가중치 공간, 대형 분자의 구성 공간)로는 어떻게 확장될까?

이산과 연속: 실용적인 응용에는 연속 분포에서 추출된 이산 표본이 포함된다. 이산 OT(선형 계획법)와 연속 OT(PDE) 간의 상호작용은 다양체 위에서 완전히 규명되지 않은 근사 오차를 만들어낸다.

정보 기하학과의 연결: Fisher 정보 계량(Fisher information metric)은 통계 모델에 자연스러운 리만 구조를 제공한다. 이 특정 다양체 위에서의 OT는 고전적인 정보 기하학(information geometry)과 어떤 관계가 있는가? 두 프레임워크를 통합할 수 있을까?

생성 모델링에의 응용: Wasserstein GAN은 유클리드 공간에서 OT를 사용한다. 리만 신경망 OT가 다양체 값 데이터(분자 형태, 방향 통계, 형상 공간)에 대한 생성 모델링을 개선할 수 있을까?

연구에 주는 시사점

응용 수학자들에게 있어, OT를 리만 다양체(Riemannian manifold)로 확장하는 것은 미분기하학, 확률론, 최적화를 결합하는 풍부한 이론을 열어준다—이 분야들은 전통적으로 상대적으로 고립된 채 발전해 왔다. 해당 문제들은 수학적으로 심오하면서도 실용적으로도 중요하다.

머신러닝 연구자들에게 있어, 리만 OT(Riemannian OT)는 유클리드 가정이 부적절한 도메인을 위한 도구를 제공한다—실제로 많은 중요한 도메인이 비유클리드적이다: 로보틱스에서의 회전(rotation), 컴퓨터 비전에서의 형상(shape), 뇌 영상에서의 공분산 행렬(covariance matrix), 생물학에서의 계통수(phylogenetic tree).

계산과학자들에게 있어, 메시 독립적 연산자 학습(mesh-independent operator learning)(Li et al.)은 과학 계산에서의 실질적인 병목 문제를 해결한다: 새로운 메시(mesh)마다 모델을 재학습해야 하는 필요성이 그것이다. OT 기반 추상화는 기하구조 전반에 걸쳐 일반화되는 학습된 솔버(solver)를 가능하게 하며, 이는 잠재적으로 엔지니어링 설계 주기를 가속화할 수 있다.

이 모든 응용 사례를 관통하는 메시지는 일관적이다: 기하구조가 중요하다. 데이터, 모델, 또는 최적화 경관(optimization landscape)이 자명하지 않은 기하학적 구조를 가질 때, 그 구조를 존중하는 방법이 이를 무시하는 방법보다 더 나은 성능을 보인다. 최적 수송(optimal transport)은 확률론과 최적화에서 기하학적 구조를 표현하고 활용하기 위한 수학적 언어를 제공한다.

References (4)

[1] Micheli, A., Cao, Y., Monod, A. (2026). Riemannian Neural Optimal Transport. arXiv:2602.03566.

DOI Scholar

[2] Ferrara, M. (2026). Geometric-Entropic Optimization: Integrating Optimal Transport with Riemannian Gradient Methods for Neural Network Training. Journal of Optimization Theory and Applications.

DOI Scholar

[3] Li, X., Li, Z., Kovachki, N. (2025). Geometric Operator Learning with Optimal Transport. arXiv:2507.20065.

DOI Scholar

[4] Gomes, A., Rodrigues, C., San Martin, L. (2025). The Riemannian geometry of the probability space of the unit circle. Semantic Scholar.

Scholar

Optimal Transport on Curved Spaces: When Wasserstein Geometry Meets Neural Network Training

Neural Optimal Transport on Manifolds

Geometric-Entropic Optimization for Training

Operator Learning on Complex Geometries

The Geometry of Probability Spaces

Claims and Evidence

Open Questions

What This Means for Your Research

곡선 공간에서의 최적 수송: Wasserstein 기하학과 신경망 훈련의 만남

다양체 위에서의 신경 최적 수송

훈련을 위한 기하-엔트로피 최적화

복잡한 기하학에서의 연산자 학습

확률 공간의 기하학

주장과 근거

미해결 문제

연구에 주는 시사점

References (4)

Explore this topic deeper