Riemannian Multinomial Logistic Regression (RMLR)
- RMLR is a generalization of multinomial logistic regression that operates on Riemannian manifolds using intrinsic tools like log and exponential maps.
- It employs a hybrid optimization strategy by combining Euclidean updates in tangent spaces with Riemannian retraction for bias points.
- The framework is applied to SPD manifolds and rotation groups, offering practical advantages in radar, human action, and EEG recognition.
Riemannian Multinomial Logistic Regression (RMLR) generalizes classical multinomial logistic regression (MLR) to data that naturally resides on Riemannian manifolds rather than Euclidean spaces. This paradigm enables the classification of manifold-valued features, such as symmetric positive definite (SPD) matrices or rotation matrices, using probabilistic models and optimization methods that respect the underlying curved geometry. RMLR achieves broad applicability by imposing only minimal geometric requirements—specifically, the existence of a Riemannian metric, a log map, an exponential map, and parallel transport operations—making it suitable for a wide range of non-Euclidean geometries (Chen et al., 2024).
1. Geometric and Mathematical Foundations
RMLR is formulated for a Riemannian manifold , where is the metric at . The key structural requirements are:
- Logarithm map and exponential map ,
- Parallel transport (or, for Lie groups, left-translation),
- No further manifold-specific structure is required.
Each class is associated with parameters , where serves as a bias/reference point and is a weight vector at a fixed base . The class-specific weight is transported to :
- By parallel transport: ,
- Or for Lie groups: .
The class score for an input is . The class conditional probabilities use the softmax function:
This construction is entirely intrinsic, relying only on the manifold’s geometry, and does not require embedding or vector-space approximations (Chen et al., 2024, Chen et al., 2023).
2. RMLR Loss, Gradients, and Optimization
The loss for RMLR is the standard multiclass negative log-likelihood (cross-entropy):
Gradients are computed w.r.t. both the tangent weight vectors and the manifold bias points:
- is computed in the Euclidean tangent space ,
- The Riemannian gradient is computed by backpropagating through the log map and the metric ,
- Updates use either classic Riemannian SGD or Adam, with:
- (Euclidean step),
- (Riemannian retraction).
Batch-wise pseudocode is explicit: compute transported weights for each class, calculate class scores, apply softmax, evaluate cross-entropy, compute error terms and gradients (including necessary adjoints for the log map/direct differentials), and update both and (Chen et al., 2024). In the SPD case, closed-form solutions are often available for exponential and logarithm maps, simplifying optimization (Chen et al., 2023).
3. Instantiations: SPD Manifolds and the Rotation Group
Symmetric Positive Definite (SPD) Manifolds
For (the space of SPD matrices), multiple metrics yield different RMLR variants:
- Log-Euclidean Metric (LEM): , ,
- Affine-Invariant Metric (AIM): ,
- Euclidean Pullback (EM): ,
- Log-Cholesky Metric (LCM): operates on the Cholesky factorization with log-mapped lower triangular differences,
- Bures-Wasserstein Metric (BWM): utilizes Lyapunov operators and specific square-root operations on SPD matrices (Chen et al., 2024).
Rotation Group SO(n)
For with the bi-invariant metric, the log map is . The score is (Chen et al., 2024).
4. Relation to Classical and Natural Gradient MLR
RMLR strictly generalizes classical MLR by endowing the classifier with full compatibility to the manifold structure of the feature space. For example, on the SPD manifold with LEM, RMLR recovers the so-called "LogEig" classifier widely used in SPD networks, showing equivalence when LEM–RMLR is combined with Riemannian updates for the class mean and Euclidean updates for the weights (Chen et al., 2023).
The Riemannian framework highlights the close relationship between optimization in RMLR and natural gradient approaches for classical MLR. On dually flat manifolds (which include the family of MLR distributions), the natural gradient is efficiently computable via dual coordinates, leading to scalable algorithms such as DSNGD with per-iteration cost linear in the number of parameters and provable convergence guarantees (Sánchez-López et al., 2020). This formalism motivates manifold generalizations of natural gradient descent for RMLR and the use of Riemannian-backbone optimization (Sánchez-López et al., 2020).
5. Implementation Aspects and Practical Recommendations
Practical deployment of RMLR involves several key considerations:
- Initialization: can be initialized as Fréchet means of class samples or as the identity; can be a zero or a small random symmetric/skew matrix, depending on the manifold.
- Complexity: Per-score computation typically involves one log map, with complexity for SPD matrices. For batch computations, spectral decompositions can be shared. BWM adds Lyapunov solves at similar complexity.
- Integration with Manifold Neural Networks: RMLR is used as a replacement for the standard Euclidean softmax layer in networks operating on manifold-valued features. Parameters are treated as learnable and incorporated into existing Riemannian backbone architectures (Chen et al., 2024).
- Optimization Algorithms: Standard Riemannian SGD and Adam optimizers can be applied, with care taken to use correct retractions (e.g., closed-form exponentials for SPD and SO(n)). Lower learning rates are often needed for bias points () than for weights () (Chen et al., 2024).
- Software Libraries: Libraries such as GeoTorch, GeoOpt, and pymanopt provide tools for handling exponential and logarithm maps, parallel transport, and retraction in practice (Chen et al., 2024).
6. Comparative Analysis, Applications, and Future Directions
RMLR enables intrinsic classifiers that leverage manifold geometry, addressing the limitations of classical methods that ignore or only approximate the geometric structure of non-Euclidean data. Implementation on SPD-valued features has shown empirical benefits for radar recognition, human action recognition, and EEG classification (Chen et al., 2023). The RMLR framework overcomes previous approaches’ restrictions, accommodating various geometries with minimal structure-dependent requirements (Chen et al., 2024).
Furthermore, DSNGD and its variants provide theoretically sound and computationally efficient algorithms for manifold-valued natural gradient optimization in MLR, with provable convergence in the dually flat setting (Sánchez-López et al., 2020). These developments suggest ongoing avenues for theoretical refinement and application expansion of RMLR, particularly as non-Euclidean and manifold-aware architectures become increasingly prevalent in machine learning.
References:
- "RMLR: Extending Multinomial Logistic Regression into General Geometries" (Chen et al., 2024)
- "Riemannian Multinomial Logistics Regression for SPD Neural Networks" (Chen et al., 2023)
- "Dual Stochastic Natural Gradient Descent and convergence of interior half-space gradient approximations" (Sánchez-López et al., 2020)