Papers
Topics
Authors
Recent
Search
2000 character limit reached

Riemannian Normal Poincaré Ball VAE

Updated 8 June 2026
  • The paper introduces a VAE framework that leverages Riemannian normal distributions on the Poincaré ball to efficiently encode hierarchical, tree-like data.
  • It employs hyperbolic geometry with geodesic reparameterization and novel chart methods (RC and bExp) to achieve numerically stable and semantically invariant latent embeddings.
  • Empirical evaluations demonstrate improved generalization and lower distortion in hierarchical tasks compared to traditional Euclidean VAEs.

The Riemannian Normal Poincaré Ball VAE is a framework for probabilistic generative modeling wherein the latent variable structure operates in a hyperbolic geometry—specifically, the Poincaré ball model—rather than a conventional Euclidean space. It employs the Riemannian normal distribution defined with respect to the Poincaré ball geometry, allowing for the efficient representation of hierarchical and tree-like data, with theoretical and empirical advantages over Euclidean latent embeddings. Recent advancements have introduced Radial Compensation (RC) and Balanced-Exponential (bExp) charts, enabling numerically robust training and semantically invariant priors on these manifolds (Mathieu et al., 2019, Papamichals et al., 18 Nov 2025).

1. Hyperbolic Geometry and the Poincaré Ball

Let dd be the latent dimension and c>0c > 0 the absolute (positive) curvature. The Poincaré ball of dimension dd and (negative) curvature c-c is defined as

Bdc={zRd:z<1c}B^c_d = \left\{ z \in \mathbb{R}^d: \|z\| < \frac{1}{\sqrt{c}} \right\}

with the Riemannian metric

gpc(z)=[λzc]2ge,λzc=21cz2g^c_p(z) = [\lambda^c_z]^2 g_e,\qquad \lambda^c_z = \frac{2}{1 - c\|z\|^2}

where geg_e denotes the Euclidean inner product. The geodesic distance between z,yBdcz, y \in B^c_d is

$d^c_p(z, y) = \frac{1}{\sqrt{c}} \arccosh \left( 1 + \frac{2c \|z-y\|^2}{(1-c\|z\|^2)(1-c\|y\|^2)} \right)$

The volume element in this geometry scales exponentially with radius, a property that fundamentally aligns with the exponential branching of hierarchical data.

2. The Riemannian Normal Distribution on the Poincaré Ball

The Riemannian normal ("maximum-entropy" normal) distribution pR(z;μ,σ)p_R(z; \mu, \sigma) on c>0c > 00 is defined by

c>0c > 01

where c>0c > 02 is the Fréchet mean, c>0c > 03 the scale, and c>0c > 04 is the Riemannian volume. The normalization constant is

c>0c > 05

where c>0c > 06, ensuring isotropy.

3. Variational Auto-Encoders in the Poincaré Ball and Riemannian ELBO

Given data c>0c > 07, the generative process posits latent variables c>0c > 08 sampled from a Riemannian normal prior c>0c > 09 and observations modeled by dd0. The inference or posterior dd1 is also a Riemannian normal, dd2. The evidence lower bound (ELBO) in this setting is

dd3

where the KL-divergence dd4 is evaluated using densities calculated with respect to the Riemannian volume (Mathieu et al., 2019). Sampling from the Riemannian normal employs a reparameterization with geodesic polar coordinates: sample dd5 (distance) and dd6 (direction), then compute dd7.

4. Radial Compensation and Balanced-Exponential Charts

Radial Compensation (RC) is an information-geometric mechanism that defines priors on the Poincaré ball so the density depends only on the geodesic radius, ensuring invariance and disentanglement of model parameters from manifold curvature. Let dd8 denote the (constant) curvature and dd9.

In tangent space c-c0, the RC base density is

c-c1

where c-c2, c-c3 is a 1D radial prior, and c-c4 with c-c5. After mapping to c-c6 via the exponential map, the resulting density on the manifold is

c-c7

where c-c8, guaranteeing that the marginal in geodesic radius matches c-c9 exactly (Papamichals et al., 18 Nov 2025).

Balanced-Exponential (bExp) charts are a parametric family of lifts Bdc={zRd:z<1c}B^c_d = \left\{ z \in \mathbb{R}^d: \|z\| < \frac{1}{\sqrt{c}} \right\}0 that interpolate between the volume-preserving Lambert map (Bdc={zRd:z<1c}B^c_d = \left\{ z \in \mathbb{R}^d: \|z\| < \frac{1}{\sqrt{c}} \right\}1) and the exponential map (Bdc={zRd:z<1c}B^c_d = \left\{ z \in \mathbb{R}^d: \|z\| < \frac{1}{\sqrt{c}} \right\}2). These charts balance numerical stability and geodesic distortion, with parameter Bdc={zRd:z<1c}B^c_d = \left\{ z \in \mathbb{R}^d: \|z\| < \frac{1}{\sqrt{c}} \right\}3 allowing the user to tune between volume distortion and geometry error without affecting semantic or statistical correctness—under RC, the induced densities and Fisher information remain invariant across all choices of Bdc={zRd:z<1c}B^c_d = \left\{ z \in \mathbb{R}^d: \|z\| < \frac{1}{\sqrt{c}} \right\}4.

5. Training Algorithms and Reparameterization Strategies

The training procedure for a Riemannian Normal Poincaré Ball VAE follows the standard VAE paradigm but incorporates manifold operations:

  • Encoder: Outputs unconstrained means and variances. The mean is mapped from Bdc={zRd:z<1c}B^c_d = \left\{ z \in \mathbb{R}^d: \|z\| < \frac{1}{\sqrt{c}} \right\}5 to Bdc={zRd:z<1c}B^c_d = \left\{ z \in \mathbb{R}^d: \|z\| < \frac{1}{\sqrt{c}} \right\}6 via the exponential or bExp chart, variance via softplus.
  • Sampling: Latent Bdc={zRd:z<1c}B^c_d = \left\{ z \in \mathbb{R}^d: \|z\| < \frac{1}{\sqrt{c}} \right\}7 is sampled using a reparameterization:
    • Draw noise Bdc={zRd:z<1c}B^c_d = \left\{ z \in \mathbb{R}^d: \|z\| < \frac{1}{\sqrt{c}} \right\}8, set Bdc={zRd:z<1c}B^c_d = \left\{ z \in \mathbb{R}^d: \|z\| < \frac{1}{\sqrt{c}} \right\}9.
    • Map gpc(z)=[λzc]2ge,λzc=21cz2g^c_p(z) = [\lambda^c_z]^2 g_e,\qquad \lambda^c_z = \frac{2}{1 - c\|z\|^2}0 to gpc(z)=[λzc]2ge,λzc=21cz2g^c_p(z) = [\lambda^c_z]^2 g_e,\qquad \lambda^c_z = \frac{2}{1 - c\|z\|^2}1 via gpc(z)=[λzc]2ge,λzc=21cz2g^c_p(z) = [\lambda^c_z]^2 g_e,\qquad \lambda^c_z = \frac{2}{1 - c\|z\|^2}2, where gpc(z)=[λzc]2ge,λzc=21cz2g^c_p(z) = [\lambda^c_z]^2 g_e,\qquad \lambda^c_z = \frac{2}{1 - c\|z\|^2}3 is the selected bExp chart.
  • Decoder: Receives gpc(z)=[λzc]2ge,λzc=21cz2g^c_p(z) = [\lambda^c_z]^2 g_e,\qquad \lambda^c_z = \frac{2}{1 - c\|z\|^2}4 as input; initial layers may involve hyperbolic-specific (e.g., gyroplane) operators.
  • ELBO Computation: KL-divergence and likelihoods are computed using the explicit forms for Riemannian normal densities and chart Jacobians.
  • Backpropagation: Derivatives flow through the chart maps (exponential, bExp), as well as through the radius sampling (with possible ARS for non-Gaussian gpc(z)=[λzc]2ge,λzc=21cz2g^c_p(z) = [\lambda^c_z]^2 g_e,\qquad \lambda^c_z = \frac{2}{1 - c\|z\|^2}5).

Empirical recommendations suggest gpc(z)=[λzc]2ge,λzc=21cz2g^c_p(z) = [\lambda^c_z]^2 g_e,\qquad \lambda^c_z = \frac{2}{1 - c\|z\|^2}6 in gpc(z)=[λzc]2ge,λzc=21cz2g^c_p(z) = [\lambda^c_z]^2 g_e,\qquad \lambda^c_z = \frac{2}{1 - c\|z\|^2}7 for reduced variance and efficient training without significant loss in ELBO or NLL performance. The RC-bExp approach also stabilizes latent flows and controls radius blow-up in high-dimensional settings (Papamichals et al., 18 Nov 2025).

6. Empirical and Theoretical Advantages for Hierarchical Data

Tree-structured and hierarchical data exhibit combinatorial branching that is naturally modeled in hyperbolic spaces, where both area and volume scale exponentially with radius. The Poincaré ball correctly mirrors this growth: distances from the root grow linearly in hierarchical depth, but the number of points at a fixed depth (the volume) grows exponentially. This geometric property enables hyperbolic latent spaces to embed large trees with lower distortion compared to Euclidean analogues.

Experimental results demonstrate that Poincaré VAEs with Riemannian normal latents exhibit superior generalization to unseen data and more accurate recovery of latent hierarchical structure than Euclidean VAEs. Applications include synthetic branching processes, hierarchical classification on MNIST, and network link prediction (Mathieu et al., 2019). The RC-bExp approach further provides stable training and interpretable hyperparameters even with large latent dimensions and varying curvatures (Papamichals et al., 18 Nov 2025).

7. Implementation and Hyperparameter Considerations

Key hyperparameters and design considerations include:

  • Chart dial gpc(z)=[λzc]2ge,λzc=21cz2g^c_p(z) = [\lambda^c_z]^2 g_e,\qquad \lambda^c_z = \frac{2}{1 - c\|z\|^2}8: Controls the trade-off between volume distortion and geodesic accuracy; gpc(z)=[λzc]2ge,λzc=21cz2g^c_p(z) = [\lambda^c_z]^2 g_e,\qquad \lambda^c_z = \frac{2}{1 - c\|z\|^2}9 is robust for most use cases.
  • Curvature geg_e0: Treated as a geometric (not statistical) parameter under RC, can be learned as in Mixed-Curvature VAEs for additional flexibility and interpretability.
  • Radial prior geg_e1: Any 1D family (Normal, HalfNormal, Gamma, Weibull, LogNormal, Cauchy) is permissible under the RC construction.
  • Dimension geg_e2: RC-bExp prevents pathological radius blow-ups even in high dimensions (geg_e3).

The training pseudocode provided in (Papamichals et al., 18 Nov 2025) outlines the main steps, with RC entering precisely in the calculation of the prior log-density and geg_e4 affecting only the Jacobian term.

In sum, Riemannian Normal Poincaré Ball VAEs, equipped with Radial Compensation and bExp charts, comprise a rigorous geometric framework for hierarchical generative modeling, combining theoretical optimality in representation with practical stability and interpretability in modern deep generative architectures (Mathieu et al., 2019, Papamichals et al., 18 Nov 2025).

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Riemannian Normal Poincaré Ball VAE.