Riemannian NML: Model Selection on Manifolds

Updated 6 April 2026

Riemannian NML is a framework that extends the NML coding scheme to data on Riemannian manifolds, offering a coordinate-invariant measure of stochastic complexity.
It leverages the invariant properties of the Riemannian volume element and Fisher information to ensure consistency under smooth coordinate transformations.
The framework reduces to standard NML in Euclidean spaces, enabling practical applications in hyperbolic Gaussian modelling for network and hierarchical data.

Riemannian Normalized Maximum Likelihood (Rm-NML) is an extension of the Normalized Maximum Likelihood (NML) universal coding and model selection framework to statistical models where the data space is a Riemannian manifold. It provides a coordinate-invariant, geometrically consistent notion of stochastic complexity and regret minimization in non-Euclidean settings. Rm-NML recovers the conventional NML code-length in Euclidean spaces and enables model selection and information-theoretic analysis for data distributed on general manifolds such as hyperbolic spaces, which are of growing interest in graph and hierarchical data modeling (Fukuzawa et al., 29 Aug 2025).

1. Formal Definition and Distribution Construction

Let $(\mathcal{M}, g)$ be a $D$ -dimensional Riemannian manifold with metric $g$ and induced volume element $d\operatorname{vol}(x) = \sqrt{\det g(x)}\,dx$ . Consider a parametric family of model densities $p_\operatorname{vol}(x|\theta)$, defined with respect to $d\operatorname{vol}(x)$ , for $x \in \mathcal{M}$ , $\theta \in \Theta$ . The maximum-likelihood estimator is

$\hat\theta(x) = \arg\max_\theta p_\operatorname{vol}(x|\theta).$

The Rm-NML distribution is then

$p_{\rm Rm\text{-}NML}(x) = \frac{ p_\operatorname{vol}(x|\hat\theta(x)) }{ \displaystyle \int_\mathcal{M} p_\operatorname{vol}(y|\hat\theta(y))\,d\operatorname{vol}(y) }.$

The associated code-length is

$D$ 0

This framework naturally generalizes the Shtarkov NML code to any manifold equipped with a Riemannian measure.

2. Coordinate Invariance and Role of the Fisher Information

The Rm-NML code-length is invariant under smooth coordinate transformations due to two geometric properties:

The volume element $D$ 1 transforms contravariantly, ensuring the scalar nature of $D$ 2.
In asymptotic normalizing constant calculations, the Fisher information metric $D$ 3 on $D$ 4 yields the Jeffreys prior $D$ 5, which is itself invariant under reparameterization.

Explicitly, if $D$ 6 and $D$ 7 are smooth coordinate charts on $D$ 8: $D$ 9 reflecting coordinate invariance in the parameter space. This guarantees that Rm-NML is well-defined and interpretable regardless of coordinate representation.

3. Reduction to Ordinary NML in Euclidean Spaces

If $g$ 0 with the standard Euclidean metric, $g$ 1 and $g$ 2, so the Rm-NML reduces exactly to conventional NML: $g$ 3 matching the original Shtarkov code-length and ensuring compatibility with classical minimum description length (MDL) theory.

4. Asymptotic and Computational Properties

For sample size $g$ 4 and under standard regularity conditions, the normalizing constant $g$ 5 can be approximated (via saddle-point asymptotics) as: $g$ 6 where $g$ 7 and $g$ 8 is the Fisher information: $g$ 9 This result replaces Lebesgue measure with the manifold volume and incorporates explicit chart transformations, mirroring the derivation of Rissanen (1996) for stochastic complexity.

5. Riemannian Symmetric Spaces and Simplifications

When $d\operatorname{vol}(x) = \sqrt{\det g(x)}\,dx$ 0 is a Riemannian symmetric space and the density $d\operatorname{vol}(x) = \sqrt{\det g(x)}\,dx$ 1 depends on $d\operatorname{vol}(x) = \sqrt{\det g(x)}\,dx$ 2 only via their geodesic distance $d\operatorname{vol}(x) = \sqrt{\det g(x)}\,dx$ 3 plus Euclidean nuisance parameters $d\operatorname{vol}(x) = \sqrt{\det g(x)}\,dx$ 4, the Fisher information matrix exhibits a block-diagonal structure: $d\operatorname{vol}(x) = \sqrt{\det g(x)}\,dx$ 5 where $d\operatorname{vol}(x) = \sqrt{\det g(x)}\,dx$ 6 and $d\operatorname{vol}(x) = \sqrt{\det g(x)}\,dx$ 7 are independent of $d\operatorname{vol}(x) = \sqrt{\det g(x)}\,dx$ 8 by manifold homogeneity. Consequently,

$d\operatorname{vol}(x) = \sqrt{\det g(x)}\,dx$ 9

In this setting, integration over the data manifold is replaced by the volume of the parameter manifold, simplifying computation substantially.

6. Explicit Hyperbolic Gaussian Case

For $p_\operatorname{vol}(x|\theta)$0-dimensional hyperbolic space $p_\operatorname{vol}(x|\theta)$1 (curvature $p_\operatorname{vol}(x|\theta)$2), the Riemannian Gaussian (R-GD) density is defined as: $p_\operatorname{vol}(x|\theta)$3 with the normalizing factor

$p_\operatorname{vol}(x|\theta)$4

For data $p_\operatorname{vol}(x|\theta)$5, the MLEs are the Riemannian–Fréchet mean $p_\operatorname{vol}(x|\theta)$6 and variance estimator $p_\operatorname{vol}(x|\theta)$7. The fitted log-likelihood is

$p_\operatorname{vol}(x|\theta)$8

Applying the symmetric-space formula, the Rm-NML normalizing constant (Corollary 6.1 (Fukuzawa et al., 29 Aug 2025)) is

$p_\operatorname{vol}(x|\theta)$9

where $d\operatorname{vol}(x)$ 0 is a geodesic ball volume (for restricted $d\operatorname{vol}(x)$ 1), and

$d\operatorname{vol}(x)$ 2

The full Rm-NML code-length for hyperbolic Gaussian models is then

$d\operatorname{vol}(x)$ 3

7. Practical Computation and Applications

The remaining integrals in $d\operatorname{vol}(x)$ 4—over Euclidean parameters or geodesic ball volumes—are amenable to numerical quadrature, Monte Carlo, or, in the Euclidean limit, Fourier methods. Optimization for the Fréchet mean on $d\operatorname{vol}(x)$ 5 is addressed by Riemannian gradient descent (Bonnabel 2013), while $d\operatorname{vol}(x)$ 6 has a closed-form estimator.

Once implemented, the Rm-NML framework enables fully coordinate-invariant model selection, regret minimization, and MDL-based coding on manifold-valued data. Notably:

For hierarchical data or graph embeddings in hyperbolic space, hyperbolic-Gaussian Rm-NML enables selection of both embedding dimension $d\operatorname{vol}(x)$ 7 and curvature (via the geodesic radius $d\operatorname{vol}(x)$ 8), fully respecting the underlying data geometry.
The framework generalizes to any manifold admitting a Riemannian structure, supporting applications where geometric structure is intrinsic to the data.

A plausible implication is that Rm-NML facilitates rigorous model selection and information-theoretic analyses for emerging applications in geometric deep learning, network analysis, and manifold-based statistical inference, particularly where non-Euclidean geometries are required (Fukuzawa et al., 29 Aug 2025).

Markdown Report Issue Upgrade to Chat

References (1)

Normalized Maximum Likelihood Code-Length on Riemannian Manifold Data Spaces (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Riemannian NML.