Riemannian Normal Poincaré Ball VAE
- The paper introduces a VAE framework that leverages Riemannian normal distributions on the Poincaré ball to efficiently encode hierarchical, tree-like data.
- It employs hyperbolic geometry with geodesic reparameterization and novel chart methods (RC and bExp) to achieve numerically stable and semantically invariant latent embeddings.
- Empirical evaluations demonstrate improved generalization and lower distortion in hierarchical tasks compared to traditional Euclidean VAEs.
The Riemannian Normal Poincaré Ball VAE is a framework for probabilistic generative modeling wherein the latent variable structure operates in a hyperbolic geometry—specifically, the Poincaré ball model—rather than a conventional Euclidean space. It employs the Riemannian normal distribution defined with respect to the Poincaré ball geometry, allowing for the efficient representation of hierarchical and tree-like data, with theoretical and empirical advantages over Euclidean latent embeddings. Recent advancements have introduced Radial Compensation (RC) and Balanced-Exponential (bExp) charts, enabling numerically robust training and semantically invariant priors on these manifolds (Mathieu et al., 2019, Papamichals et al., 18 Nov 2025).
1. Hyperbolic Geometry and the Poincaré Ball
Let be the latent dimension and the absolute (positive) curvature. The Poincaré ball of dimension and (negative) curvature is defined as
with the Riemannian metric
where denotes the Euclidean inner product. The geodesic distance between is
$d^c_p(z, y) = \frac{1}{\sqrt{c}} \arccosh \left( 1 + \frac{2c \|z-y\|^2}{(1-c\|z\|^2)(1-c\|y\|^2)} \right)$
The volume element in this geometry scales exponentially with radius, a property that fundamentally aligns with the exponential branching of hierarchical data.
2. The Riemannian Normal Distribution on the Poincaré Ball
The Riemannian normal ("maximum-entropy" normal) distribution on 0 is defined by
1
where 2 is the Fréchet mean, 3 the scale, and 4 is the Riemannian volume. The normalization constant is
5
where 6, ensuring isotropy.
3. Variational Auto-Encoders in the Poincaré Ball and Riemannian ELBO
Given data 7, the generative process posits latent variables 8 sampled from a Riemannian normal prior 9 and observations modeled by 0. The inference or posterior 1 is also a Riemannian normal, 2. The evidence lower bound (ELBO) in this setting is
3
where the KL-divergence 4 is evaluated using densities calculated with respect to the Riemannian volume (Mathieu et al., 2019). Sampling from the Riemannian normal employs a reparameterization with geodesic polar coordinates: sample 5 (distance) and 6 (direction), then compute 7.
4. Radial Compensation and Balanced-Exponential Charts
Radial Compensation (RC) is an information-geometric mechanism that defines priors on the Poincaré ball so the density depends only on the geodesic radius, ensuring invariance and disentanglement of model parameters from manifold curvature. Let 8 denote the (constant) curvature and 9.
In tangent space 0, the RC base density is
1
where 2, 3 is a 1D radial prior, and 4 with 5. After mapping to 6 via the exponential map, the resulting density on the manifold is
7
where 8, guaranteeing that the marginal in geodesic radius matches 9 exactly (Papamichals et al., 18 Nov 2025).
Balanced-Exponential (bExp) charts are a parametric family of lifts 0 that interpolate between the volume-preserving Lambert map (1) and the exponential map (2). These charts balance numerical stability and geodesic distortion, with parameter 3 allowing the user to tune between volume distortion and geometry error without affecting semantic or statistical correctness—under RC, the induced densities and Fisher information remain invariant across all choices of 4.
5. Training Algorithms and Reparameterization Strategies
The training procedure for a Riemannian Normal Poincaré Ball VAE follows the standard VAE paradigm but incorporates manifold operations:
- Encoder: Outputs unconstrained means and variances. The mean is mapped from 5 to 6 via the exponential or bExp chart, variance via softplus.
- Sampling: Latent 7 is sampled using a reparameterization:
- Draw noise 8, set 9.
- Map 0 to 1 via 2, where 3 is the selected bExp chart.
- Decoder: Receives 4 as input; initial layers may involve hyperbolic-specific (e.g., gyroplane) operators.
- ELBO Computation: KL-divergence and likelihoods are computed using the explicit forms for Riemannian normal densities and chart Jacobians.
- Backpropagation: Derivatives flow through the chart maps (exponential, bExp), as well as through the radius sampling (with possible ARS for non-Gaussian 5).
Empirical recommendations suggest 6 in 7 for reduced variance and efficient training without significant loss in ELBO or NLL performance. The RC-bExp approach also stabilizes latent flows and controls radius blow-up in high-dimensional settings (Papamichals et al., 18 Nov 2025).
6. Empirical and Theoretical Advantages for Hierarchical Data
Tree-structured and hierarchical data exhibit combinatorial branching that is naturally modeled in hyperbolic spaces, where both area and volume scale exponentially with radius. The Poincaré ball correctly mirrors this growth: distances from the root grow linearly in hierarchical depth, but the number of points at a fixed depth (the volume) grows exponentially. This geometric property enables hyperbolic latent spaces to embed large trees with lower distortion compared to Euclidean analogues.
Experimental results demonstrate that Poincaré VAEs with Riemannian normal latents exhibit superior generalization to unseen data and more accurate recovery of latent hierarchical structure than Euclidean VAEs. Applications include synthetic branching processes, hierarchical classification on MNIST, and network link prediction (Mathieu et al., 2019). The RC-bExp approach further provides stable training and interpretable hyperparameters even with large latent dimensions and varying curvatures (Papamichals et al., 18 Nov 2025).
7. Implementation and Hyperparameter Considerations
Key hyperparameters and design considerations include:
- Chart dial 8: Controls the trade-off between volume distortion and geodesic accuracy; 9 is robust for most use cases.
- Curvature 0: Treated as a geometric (not statistical) parameter under RC, can be learned as in Mixed-Curvature VAEs for additional flexibility and interpretability.
- Radial prior 1: Any 1D family (Normal, HalfNormal, Gamma, Weibull, LogNormal, Cauchy) is permissible under the RC construction.
- Dimension 2: RC-bExp prevents pathological radius blow-ups even in high dimensions (3).
The training pseudocode provided in (Papamichals et al., 18 Nov 2025) outlines the main steps, with RC entering precisely in the calculation of the prior log-density and 4 affecting only the Jacobian term.
In sum, Riemannian Normal Poincaré Ball VAEs, equipped with Radial Compensation and bExp charts, comprise a rigorous geometric framework for hierarchical generative modeling, combining theoretical optimality in representation with practical stability and interpretability in modern deep generative architectures (Mathieu et al., 2019, Papamichals et al., 18 Nov 2025).