Hyperbolic Residual Quantization (HRQ)

Updated 28 November 2025

Hyperbolic Residual Quantization (HRQ) is a discrete representation framework that uses hyperbolic geometry to accurately model exponentially branching, tree-structured data.
By replacing Euclidean operations with Möbius counterparts, HRQ aligns quantization with the natural exponential growth of hierarchies, ensuring lower distortion in deep branches.
Empirical evaluations demonstrate that HRQ can improve Recall@10 by up to 20% compared to standard Residual Quantization, especially in tasks like hierarchy modeling and recommendation systems.

Hyperbolic Residual Quantization (HRQ) is a discrete representation learning framework designed to model data with latent hierarchical structures by performing residual quantization in a hyperbolic manifold, specifically the Poincaré ball. HRQ extends the standard Residual Quantization (RQ) paradigm by replacing Euclidean geometry with hyperbolic geometry throughout the embedding network, residual computation, and codebook lookup operations. This modification imparts an inductive bias aligned with exponentially branching, tree-like data, enabling more faithful and effective modeling of hierarchies. HRQ generates multi-token discrete representations that are empirically more suitable for downstream tasks involving hierarchical relationships than Euclidean RQ, with observed performance gains of up to 20% in hierarchy-structured evaluation scenarios (Piękos et al., 18 May 2025).

1. Background and Motivation

Residual Quantization (RQ) represents a vector $x \in \mathbb{R}^h$ by iteratively quantizing a sequence of residuals $r^i$ using multiple codebooks $C_0, \dots, C_{k-1}$ . Each quantization step produces a token $t_i$ indexing the closest codeword, yielding a multitoken encoding $[t_0, \dots, t_{k-1}]$ and a reconstruction

$y = \sum_{i=0}^{k-1} e^i, \quad e^i = C_i[t_i].$

However, Euclidean RQ struggles with data exhibiting hierarchical branching. Real-world hierarchies, such as WordNet hypernym structures, exhibit exponential growth in leaf nodes at increasing depth. In contrast, Euclidean space's polynomial volume growth leads to "cramped" representations, insufficient separation of subtrees, and distortion of the hierarchical signal—particularly in deep branches. Empirical analyses demonstrate that RQ-VAEs incur substantial distortion in lower quantization levels when applied to such data.

Hyperbolic space possesses a fundamentally different volume growth property: in $\mathbb{P}^n_c$ of negative curvature $-c < 0$ , volume grows exponentially with radius. This matches the geometry of tree-structured data, allowing embeddings to reflect both the depth and branching of hierarchies with low distortion. Any finite tree can be embedded in two-dimensional hyperbolic space with arbitrarily low distortion. Thus, HRQ leverages these properties to overcome geometric mismatch (Piękos et al., 18 May 2025).

2. Foundations: Hyperbolic Geometry

HRQ adopts the $n$ -dimensional Poincaré ball model $\mathbb{P}_c^n = \{x \in \mathbb{R}^n : c \|x\|^2 < 1\}$ , with its Riemannian metric

$g_x = \lambda_x^2 g_{\rm Eucl}, \qquad \lambda_x = \frac{2}{1-c\|x\|^2}.$

Key hyperbolic operations for HRQ include:

Möbius Addition: The hyperbolic analogue of vector addition for $x, y \in \mathbb{P}_c^n$ is

$x \oplus_c y = \frac{(1+2c\langle x, y\rangle + c\|y\|^2)x + (1-c\|x\|^2)y}{1+2c\langle x, y\rangle + c^2\|x\|^2\|y\|^2}.$

Its inverse, Möbius subtraction, is $x \ominus_c y := x \oplus_c (-y)$ .

Hyperbolic Distance: The geodesic distance between $u$ and $v$ is

$d_{\mathcal{P}}(u,v) = \operatorname{arcosh}\!\left(1 + 2c\frac{\|u-v\|^2}{(1-c\|u\|^2)(1-c\|v\|^2)}\right).$

Exponential and Logarithmic Maps: These maps allow moving between tangent spaces and the manifold:

$\exp_x(v) = x \oplus_c \left(\tanh(\sqrt{c}\,\lambda_x \|v\|/2) \frac{v}{\sqrt{c}\|v\|}\right), \ \log_x(y) = \frac{2}{\sqrt{c}} \tanh^{-1}\left(\sqrt{c}\|{-x \oplus_c y}\|\right)\frac{-x \oplus_c y}{\|{-x \oplus_c y}\|}.$

These operations are essential for constructing neural-black hyperbolic layers and for the quantization algorithm.

3. HRQ Architecture and Algorithm

HRQ replaces each Euclidean linear (and nonlinearity) layer in the encoder and decoder with their hyperbolic counterparts. Given an input $x_{\rm EUC} \in \mathbb{R}^d$ , it is lifted to the manifold via the exponential map:

$x_0 = \exp_0(x_{\rm EUC}).$

A stack of Möbius-linear and nonlinearity layers then produces a latent encoding $z \in \mathbb{P}_c^h$ .

HRQ maintains $k$ codebooks $C_0, \dots, C_{k-1} \subset \mathbb{P}_c^h$ , each of size $s$ . Starting with $r^0 = z$ , the quantization proceeds as follows for $i=0, \dots, k-1$ :

Token selection: $t_i = \arg\min_{j=1\dots s} d_{\mathcal{P}}(r^i, C_i[j])$ , with $e^i = C_i[t_i]$ .
Hyperbolic residual: $r^{i+1} = r^i \ominus_c e^i$ .

The multitoken representation is $[t_0, \dots, t_{k-1}]$ . Reconstruction is performed by sequential Möbius addition:

$y = e^0 \oplus_c e^1 \oplus_c \cdots \oplus_c e^{k-1}.$

The decoder mirrors the encoder: $\hat{z} = y$ , $\hat{x}_0 = \mathrm{Dec}_{\mathcal{P}}(\hat{z})$ , and the Euclidean reconstruction is $\hat{x}_{\rm EUC} = \log_0(\hat{x}_0)$ . The pseudo-code provided outlines this sequence of operations (Piękos et al., 18 May 2025).

4. Learning Objectives and Optimization

HRQ-VAE is trained by minimizing two principal objectives:

Reconstruction Loss: Measures discrepancy in Euclidean space,

$\mathcal{L}_{\rm rec} = \|x_{\rm EUC} - \hat{x}_{\rm EUC}\|^2.$

Hyperbolic RQ Codebook Loss: Enforces codebook commitment and alignment at each quantization level,

$\mathcal{L}_{\rm RQ} = \sum_{i=0}^{k-1} \left(\|\mathrm{sg}[\,r^i\,] - e^i\|^2 + \alpha\|r^i - \mathrm{sg}[\,e^i\,]\|^2\right),$

where $\mathrm{sg}[\cdot]$ denotes the stop-gradient operator and $\alpha$ trades off codebook commitment.

Optimization is performed using Riemannian variants of Adam or SGD, ensuring updates respect manifold constraints. Training typically involves thousands of epochs with learning rate warmup and utilizes the straight-through estimator to tie codebook and encoder updates.

5. Empirical Evaluation

HRQ has been evaluated on both supervised hierarchy modeling and unsupervised hierarchy discovery:

Supervised Hierarchy Modeling (WordNet):
- Dataset: 82,115 nouns, 743,241 hypernymy edges (transitive closure).
- Protocol: Multitoken embeddings are trained with contrastive loss on hypernym vs. non-hypernym pairs plus HRQ loss; a Transformer seq2seq model is then trained to predict hypernyms.
- Metric: Recall@10 on a held-out 15% of hypernym pairs.
- Results: For codebook size 256 and $k=4$ ,
- Hidden dim 32: RQ 67.3% vs. HRQ 79.9% ( $\approx +19\%$ )
- Hidden dim 16: RQ 69.8% vs. HRQ 78.9% ( $+13\%$ )
- HRQ consistently yields up to +20 percentage points Recall@10 improvement compared to RQ.
Unsupervised Hierarchy Discovery (Recommender Systems):
- Datasets: Amazon Reviews 2014 (multiple product categories), MovieLens-10M (LLM-generated movie descriptions).
- Protocol: Each item is encoded via a text encoder, HRQ-VAE or RQ-VAE produces multitokens, and a Transformer recommender predicts next items from these tokens.
- Metrics: Recall@K, NDCG@K for $K \in \{5,10\}$ .
- Representative results:
- Amazon Beauty, Recall@10: RQ 4.83% $\to$ HRQ 5.01% ( $+3.7\%$ )
- MovieLens, Recall@10: RQ 25.11% $\to$ HRQ 25.49% ( $+1.5\%$ )
- Across all metrics, HRQ-VAE shows 1–7% relative improvements (Piękos et al., 18 May 2025).

6. Analysis, Limitations, and Extensions

HRQ’s efficacy is rooted in the geometry of hyperbolic space:

Hierarchical Codeword Placement: Low-frequency (high-level) codewords are encoded near the Poincaré ball’s center; high-frequency (leaf-level) codewords migrate near the boundary. The ball’s radii naturally map to hierarchy depths, encoding tree-like relationships with fidelity.
Möbius Operations: Manifold-specific operations preserve geodesic distances, obviating distortions that Euclidean subtraction incurs in deep branches.
Utilization Uniformity: The coefficient of variation of latent vector norms is nearly halved under HRQ compared to RQ, reflecting a more uniform use of representational capacity.

Limitations include:

Assumed Data Structure: HRQ presumes data with latent hierarchical structure and focuses on discrete tokenization rather than continuous generative modeling. Extensions to diffusion models or other continuous domains remain an open area.
Computational Complexity: Hyperbolic codebooks scale as $O(ks)$ and their reliance on nearest-neighbor search in non-Euclidean space presents scalability and efficiency challenges.
Open Research Directions: Future work may explore hybrid (product) geometries, more expressive hyperbolic clustering techniques, and broader applications such as phylogenetics, knowledge graph embedding, and curriculum learning in RL.

7. Broader Significance and Future Work

Integrating hyperbolic geometry into multitoken discrete representation learning enables explicit encoding of latent hierarchies, providing a hierarchical inductive bias that is well-aligned with exponentially branching data. HRQ achieves substantial gains in both supervised and unsupervised downstream tasks, including up to 20% improvement in hierarchy modeling recall. Prospective extensions include developing scalable non-Euclidean nearest neighbor search algorithms, investigating product/hybrid manifolds, and adapting HRQ for continuous generative tasks and novel domains. The method establishes a theoretical and empirical foundation for hierarchically structured quantization in modern machine learning (Piękos et al., 18 May 2025).

PDF Markdown Chat (Pro)

References (1)

Hyperbolic Residual Quantization: Discrete Representations for Data with Latent Hierarchies (2025)

Whiteboard

Generate a whiteboard explanation of this topic.

Follow Topic

Get notified by email when new papers are published related to Hyperbolic Residual Quantization (HRQ).