Trend AnalysisMathematics & StatisticsOptimization & Operations Research

Optimal Transport on Curved Spaces: When Wasserstein Geometry Meets Neural Network Training

Optimal transport theory—measuring the most efficient way to move one probability distribution to another—has become a powerful tool in machine learning. The 2025-2026 frontier extends OT to curved Riemannian manifolds, enabling geometric neural network training and operator learning on complex domains.

By Sean K.S. Shin
This blog summarizes research trends based on published paper abstracts. Specific numbers or findings may contain inaccuracies. For scholarly rigor, always consult the original papers cited in each post.

Optimal transport (OT) theory asks a deceptively simple question: given two probability distributions, what is the most efficient way to transform one into the other? The "cost" of transformation—measured by the Wasserstein distance—provides a geometry on the space of probability distributions that has proven remarkably useful across machine learning, from generative modeling (Wasserstein GANs) to domain adaptation to single-cell biology.

But most OT applications assume that the underlying data lives in flat Euclidean space. Real data often does not. Probability distributions on spheres (directional data), hyperbolic spaces (hierarchical data), manifolds of positive definite matrices (covariance data), and more general Riemannian manifolds require OT theory that respects the geometry of the space on which distributions are defined.

The 2025-2026 research frontier extends OT to these curved settings, with implications that span from pure mathematics (the geometry of probability spaces) to applied machine learning (training neural networks on non-Euclidean domains).

Neural Optimal Transport on Manifolds

Micheli et al. (2026) present the most direct application: neural OT methods that compute transport maps between distributions on Riemannian manifolds. Standard neural OT methods—which use neural networks to learn transport maps from data—are tailored to Euclidean geometry. They parameterize transport maps as functions from ℝⁿ to ℝⁿ and optimize using Euclidean gradients.

On a Riemannian manifold, this approach fails: the transport map must respect the manifold's geometry—mapping points on the manifold to other points on the manifold while remaining consistent with the curved metric. Micheli et al. solve this by parameterizing transport maps as compositions of exponential maps (which map tangent vectors to manifold points) and geodesic interpolations (which define curves of minimal length on the manifold).

The practical significance: any application where data naturally lives on a manifold—molecular conformations on the space of rotation matrices, brain connectivity patterns on the space of positive definite matrices, wind directions on the sphere—can now benefit from neural OT methods that respect the data's intrinsic geometry.

Geometric-Entropic Optimization for Training

Ferrara (2026) integrates OT with Riemannian gradient methods for neural network training. The core insight: the loss landscape of a neural network can be understood as a problem in the geometry of probability distributions. The model's parameters define a probability distribution over outputs; training moves this distribution toward the target distribution of correct outputs. OT provides a natural measure of the distance between these distributions.

The geometric-entropic formulation adds an entropic regularization term to the OT distance—smoothing the optimization landscape and enabling efficient gradient computation. The Riemannian gradient methods then navigate this landscape along geodesics of the parameter manifold, respecting the natural geometry of the optimization problem.

This is more than a mathematical curiosity. Standard gradient descent treats all parameter directions equally, but the parameter space of a neural network has a natural Riemannian structure (the Fisher information metric) where some directions correspond to large changes in model behavior and others to negligible changes. Riemannian optimization that respects this structure can converge faster and to better optima than standard methods.

Operator Learning on Complex Geometries

Li et al. extend OT to operator learning—learning mappings between function spaces, as required for solving partial differential equations (PDEs) on complex geometries. Their approach generalizes discretized meshes (the standard representation for PDE domains) to mesh density functions—probability distributions over the domain that OT can transport between different discretizations.

This abstraction is practically valuable because it enables operator learning that is mesh-independent: the learned operator works for any mesh resolution or configuration, not just the specific mesh used during training. For engineering applications where PDE solutions must be computed on different meshes for different geometries, mesh-independent operators avoid the prohibitive cost of retraining for each geometry.

The Geometry of Probability Spaces

Gomes et al. provide foundational mathematics that underpins these applications: an intrinsic development of the Riemannian geometry of the Wasserstein space on the unit circle. Building on work by Otto, Lott, and Villani, they develop the geometric tools—curvature, geodesics, parallel transport—needed to do calculus on the space of probability distributions.

This is pure mathematics with applied consequences. The geometric properties of Wasserstein space determine the convergence behavior of optimization algorithms that operate on probability distributions. Understanding curvature of the probability space, for instance, predicts where gradient-based methods will converge quickly (positive curvature) and where they will struggle (negative curvature).

Claims and Evidence

<
ClaimEvidenceVerdict
Standard OT methods fail on non-Euclidean dataEuclidean parameterization violates manifold constraints✅ Mathematical fact
Riemannian neural OT respects manifold geometryMicheli et al. demonstrate geodesic-aware transport maps✅ Supported
OT-based training improves neural network optimizationFerrara shows convergence benefits of geometric-entropic formulation✅ Supported (theoretical + experimental)
Mesh-independent operator learning is feasible via OTLi et al. demonstrate across multiple PDE domains✅ Supported
Wasserstein geometry on manifolds is fully understoodActive research area; many open questions remain⚠️ Partially understood

Open Questions

  • Computational cost: Riemannian OT is more expensive than Euclidean OT. The exponential and logarithmic maps that replace simple addition and subtraction add computational overhead. Can we develop approximations that maintain geometric fidelity at lower cost?
  • High-dimensional manifolds: Current Riemannian OT methods work well for low-dimensional manifolds (spheres, rotation groups). How do they scale to high-dimensional manifolds (the space of neural network weights, the configuration space of large molecules)?
  • Discrete vs. continuous: Practical applications involve discrete samples from continuous distributions. The interplay between discrete OT (which is a linear program) and continuous OT (which is a PDE) creates approximation errors that are not fully characterized on manifolds.
  • Connections to information geometry: The Fisher information metric provides a natural Riemannian structure on statistical models. How does OT on this specific manifold relate to classical information geometry? Can the two frameworks be unified?
  • Applications to generative modeling: Wasserstein GANs use OT in Euclidean space. Can Riemannian neural OT improve generative modeling for manifold-valued data (molecular conformations, directional statistics, shape spaces)?
  • What This Means for Your Research

    For applied mathematicians, the extension of OT to Riemannian manifolds opens a rich theory that combines differential geometry, probability theory, and optimization—fields that have traditionally developed in relative isolation. The problems are mathematically deep and practically relevant.

    For machine learning researchers, Riemannian OT provides tools for domains where Euclidean assumptions are inappropriate—and many important domains are non-Euclidean: rotations in robotics, shapes in computer vision, covariance matrices in brain imaging, phylogenetic trees in biology.

    For computational scientists, mesh-independent operator learning (Li et al.) addresses a practical bottleneck in scientific computing: the need to retrain models for each new mesh. OT-based abstraction enables learned solvers that generalize across geometries, potentially accelerating engineering design cycles.

    The message across all these applications is consistent: geometry matters. When data, models, or optimization landscapes have non-trivial geometric structure, methods that respect that structure outperform those that ignore it. Optimal transport provides the mathematical language for expressing and exploiting geometric structure in probability and optimization.

    References (4)

    [1] Micheli, A., Cao, Y., Monod, A. (2026). Riemannian Neural Optimal Transport. arXiv:2602.03566.
    [2] Ferrara, M. (2026). Geometric-Entropic Optimization: Integrating Optimal Transport with Riemannian Gradient Methods for Neural Network Training. Journal of Optimization Theory and Applications.
    [3] Li, X., Li, Z., Kovachki, N. (2025). Geometric Operator Learning with Optimal Transport. arXiv:2507.20065.
    [4] Gomes, A., Rodrigues, C., San Martin, L. (2025). The Riemannian geometry of the probability space of the unit circle. Semantic Scholar.

    Explore this topic deeper

    Search 290M+ papers, detect research gaps, and find what hasn't been studied yet.

    Click to remove unwanted keywords

    Search 8 keywords →