Optimal transport (OT) theory asks a deceptively simple question: given two probability distributions, what is the most efficient way to transform one into the other? The "cost" of transformation—measured by the Wasserstein distance—provides a geometry on the space of probability distributions that has proven remarkably useful across machine learning, from generative modeling (Wasserstein GANs) to domain adaptation to single-cell biology.
But most OT applications assume that the underlying data lives in flat Euclidean space. Real data often does not. Probability distributions on spheres (directional data), hyperbolic spaces (hierarchical data), manifolds of positive definite matrices (covariance data), and more general Riemannian manifolds require OT theory that respects the geometry of the space on which distributions are defined.
The 2025-2026 research frontier extends OT to these curved settings, with implications that span from pure mathematics (the geometry of probability spaces) to applied machine learning (training neural networks on non-Euclidean domains).
Neural Optimal Transport on Manifolds
Micheli et al. (2026) present the most direct application: neural OT methods that compute transport maps between distributions on Riemannian manifolds. Standard neural OT methods—which use neural networks to learn transport maps from data—are tailored to Euclidean geometry. They parameterize transport maps as functions from ℝⁿ to ℝⁿ and optimize using Euclidean gradients.
On a Riemannian manifold, this approach fails: the transport map must respect the manifold's geometry—mapping points on the manifold to other points on the manifold while remaining consistent with the curved metric. Micheli et al. solve this by parameterizing transport maps as compositions of exponential maps (which map tangent vectors to manifold points) and geodesic interpolations (which define curves of minimal length on the manifold).
The practical significance: any application where data naturally lives on a manifold—molecular conformations on the space of rotation matrices, brain connectivity patterns on the space of positive definite matrices, wind directions on the sphere—can now benefit from neural OT methods that respect the data's intrinsic geometry.
Geometric-Entropic Optimization for Training
Ferrara (2026) integrates OT with Riemannian gradient methods for neural network training. The core insight: the loss landscape of a neural network can be understood as a problem in the geometry of probability distributions. The model's parameters define a probability distribution over outputs; training moves this distribution toward the target distribution of correct outputs. OT provides a natural measure of the distance between these distributions.
The geometric-entropic formulation adds an entropic regularization term to the OT distance—smoothing the optimization landscape and enabling efficient gradient computation. The Riemannian gradient methods then navigate this landscape along geodesics of the parameter manifold, respecting the natural geometry of the optimization problem.
This is more than a mathematical curiosity. Standard gradient descent treats all parameter directions equally, but the parameter space of a neural network has a natural Riemannian structure (the Fisher information metric) where some directions correspond to large changes in model behavior and others to negligible changes. Riemannian optimization that respects this structure can converge faster and to better optima than standard methods.
Operator Learning on Complex Geometries
Li et al. extend OT to operator learning—learning mappings between function spaces, as required for solving partial differential equations (PDEs) on complex geometries. Their approach generalizes discretized meshes (the standard representation for PDE domains) to mesh density functions—probability distributions over the domain that OT can transport between different discretizations.
This abstraction is practically valuable because it enables operator learning that is mesh-independent: the learned operator works for any mesh resolution or configuration, not just the specific mesh used during training. For engineering applications where PDE solutions must be computed on different meshes for different geometries, mesh-independent operators avoid the prohibitive cost of retraining for each geometry.
The Geometry of Probability Spaces
Gomes et al. provide foundational mathematics that underpins these applications: an intrinsic development of the Riemannian geometry of the Wasserstein space on the unit circle. Building on work by Otto, Lott, and Villani, they develop the geometric tools—curvature, geodesics, parallel transport—needed to do calculus on the space of probability distributions.
This is pure mathematics with applied consequences. The geometric properties of Wasserstein space determine the convergence behavior of optimization algorithms that operate on probability distributions. Understanding curvature of the probability space, for instance, predicts where gradient-based methods will converge quickly (positive curvature) and where they will struggle (negative curvature).
Claims and Evidence
<| Claim | Evidence | Verdict |
|---|---|---|
| Standard OT methods fail on non-Euclidean data | Euclidean parameterization violates manifold constraints | ✅ Mathematical fact |
| Riemannian neural OT respects manifold geometry | Micheli et al. demonstrate geodesic-aware transport maps | ✅ Supported |
| OT-based training improves neural network optimization | Ferrara shows convergence benefits of geometric-entropic formulation | ✅ Supported (theoretical + experimental) |
| Mesh-independent operator learning is feasible via OT | Li et al. demonstrate across multiple PDE domains | ✅ Supported |
| Wasserstein geometry on manifolds is fully understood | Active research area; many open questions remain | ⚠️ Partially understood |
Open Questions
What This Means for Your Research
For applied mathematicians, the extension of OT to Riemannian manifolds opens a rich theory that combines differential geometry, probability theory, and optimization—fields that have traditionally developed in relative isolation. The problems are mathematically deep and practically relevant.
For machine learning researchers, Riemannian OT provides tools for domains where Euclidean assumptions are inappropriate—and many important domains are non-Euclidean: rotations in robotics, shapes in computer vision, covariance matrices in brain imaging, phylogenetic trees in biology.
For computational scientists, mesh-independent operator learning (Li et al.) addresses a practical bottleneck in scientific computing: the need to retrain models for each new mesh. OT-based abstraction enables learned solvers that generalize across geometries, potentially accelerating engineering design cycles.
The message across all these applications is consistent: geometry matters. When data, models, or optimization landscapes have non-trivial geometric structure, methods that respect that structure outperform those that ignore it. Optimal transport provides the mathematical language for expressing and exploiting geometric structure in probability and optimization.