Reference-Log-Linear Distances
- Reference-log-linear distances are metrics that locally linearize Riemannian geodesic distances using logarithmic and exponential maps from a fixed reference measure.
- They enable efficient embedding of complex objects like probability measures and SPD matrices into tangent Hilbert spaces, facilitating scalable computational comparisons.
- The approach maintains key geometric information and provides exact metric recovery on geodesics, making it valuable for optimal transport and data analysis.
Reference-log-linear distances are metrics or pseudo-metrics that leverage the local linearization of a geodesic distance (often Riemannian) around a chosen reference point in a metric space. This construction enables embedding complex objects—such as probability measures or positive-definite matrices—into (pre-)Hilbert spaces, facilitating computationally efficient comparisons while preserving substantial geometric information from the underlying metric. The concept arises prominently in the context of optimal transport-based geometries, especially for the Hellinger–Kantorovich (HK) metric, and is a particular example of tangent space embedding via explicit logarithmic and exponential maps (Cai et al., 2021).
1. Riemannian Metric Structure and Local Linearization
Reference-log-linear distances rely on exploiting the Riemannian metric structure of a space of interest. For the space of non-negative Radon measures on a domain , the HK metric defines a Riemannian structure via a dynamic (Benamou–Brenier-type) formulation:
subject to conservation , with , .
At any reference measure Lebesgue, the tangent space can be identified with triples , where is a vector field, a scalar function, and a singular measure. The inner product is defined as
0
where 1 is any dominating reference for the singular parts (Cai et al., 2021).
This formalism enables the exact evaluation of the squared length of tangent vectors and local linearization of the HK metric around 2.
2. Logarithmic and Exponential Maps in Measure Spaces
The logarithmic map 3 provides a vector in the tangent space at 4 representing the direction and speed of the geodesic from 5 to 6. Given a fixed reference measure 7, for any sample 8 the Log map is computed via:
- Solution of a static soft-marginal Kantorovich problem with cost
9
- Derivation of the Monge-form minimizer 0,
- Decomposition of 1, 2 with respect to their marginal densities, and
- Assembly of the tangent fields 3 (see details in (Cai et al., 2021), Proposition 3.9).
The exponential map 4 provides an explicit inversion, reconstructing measures from tangent vectors, and for geodesics starting at 5 the HK metric is given exactly by the norm of this tangent vector.
3. Construction and Properties of the Reference-Log-Linear Distance
The reference-log-linear distance, defined for measures 6 relative to a fixed reference 7, is: 8 Notably, for 9 close to 0, this linearized distance provides a first-order approximation to the full metric: 1 while for the special case of geodesics from 2, it provides the exact metric value: 3.
This construction admits an efficient computational recipe: for 4 samples, each is embedded through 5 one-to-reference OT computations, and pairwise comparisons reduce to Euclidean operations in the tangent Hilbert space, substantially reducing computational cost compared to 6 pairwise metric evaluations (Cai et al., 2021).
4. Algorithmic Implementation and Discrete Setting
For discretely supported measures, the procedure involves:
- Solving 7 entropic-regularized unbalanced optimal transport problems (using soft marginals),
- For each, barycentric projection of transport plans to construct the tangent fields 8,
- Assembly of the tangent space embeddings,
- Pairwise distances computed by the Hilbert norm—in practice, this involves summing 9 plus the Hellinger distance for unmatched mass.
The embedding enables subsequent data analysis—such as PCA, clustering, or SVM—in the tangent Hilbert space at the reference. This approach offers an efficient, scalable surrogate to the HK geometry for applications requiring the analysis of large numbers of empirical measures.
5. Analytical and Practical Significance
The reference-log-linear approach preserves sensitivity to the unbalanced optimal transport geometry while providing access to the computational and analytical toolkit of linear spaces. The method is exact on geodesics from the reference and provides a first-order approximation near the reference, making it particularly suitable for tasks where most measures are expected to be close in HK distance to the reference.
By reducing metric structure to inner products, it enables the use of a broad suite of machine learning and data analysis methods traditionally restricted to Euclidean settings, while maintaining an interpretable connection to the original HK metric (Cai et al., 2021). This approach also avoids the quadratic scaling of full pairwise metric computations.
6. Connections to Other Linearization and Spectral Approaches
Reference-log-linear embeddings are part of a broader class of linearization techniques for geometrically complex distance spaces. Similar spectral or tangent-space techniques appear, for instance, in the log-Euclidean signatures framework for SPD matrices. There, the log-Euclidean (LE) metric is linearized by considering differences in log-spectra, and a dataset is embedded via distances to a fixed collection of reference matrices (Shnitzer et al., 2022). In both settings, the critical insight is that carefully chosen reference-based embeddings preserve local geometry and metric information to first order while enabling highly efficient downstream analysis.
A plausible implication is that such log-linearization schemes can be systematically generalized to other Riemannian metric spaces of interest in data science and geometry, wherever explicit log and exp maps and inner products in tangent spaces are accessible.