Adaptive Learned Distance Function

Updated 5 February 2026

The paper introduces adaptively learned distance functions as parameterized metrics that infer optimal similarity measures from data, tailoring to specific task structures.
It details methodologies including gradient-based optimization, Bellman regression, and meta-learning to dynamically adjust geometric and statistical properties.
Applications range from classification and reinforcement learning to anomaly detection and robotics, demonstrating improved robustness and discrimination.

An adaptively learned distance function is a parameterized function or computational process that infers optimal notions of distance or similarity from data, learning parameters directly—often through gradient-based optimization or recursive updating—so that the resulting metric is tailored to the structure, distribution, or task semantics of the observed problem. Unlike fixed or purely hand-crafted distance functions, adaptively learned distances optimize geometric, statistical, or semantic properties for superior discrimination, robustness, or generalization in high-dimensional learning, inference, or control settings.

1. Mathematical Formulations and Principles

Adaptively learned distance functions span multiple mathematical forms and domains but generally introduce learnable parameters into a canonical distance or metric structure, often coupled with a data-driven optimization objective.

Neural Distance Metrics in Classification: In neural architectures like OffsetL2, the class score for input $x$ is computed as the (learned) p-norm distance between an embedded feature $z = W_1x + b_1$ and a learnable prototype $\mu_i$ :

$d_i(z) = \| A_i (z - \mu_i) \|_p,$

where $A_i$ is a learnable class-specific scaling (whitening) matrix. For OffsetL2, $A_i = \operatorname{diag}(\alpha_i)$ , a learned vector of positive precisions. The model outputs $-\| \alpha_i \odot (z - \mu_i) \|_2$ as the logit for class $i$ (Oursland, 4 Feb 2025).

Action-based and Bellman-driven RL Distances: In goal-conditioned RL, the learned distance $d_\theta(s,g)$ estimates the expected minimal number of actions from state $s$ to goal $g$ under policy $\pi$ , optimized to solve the Bellman equation:

$L(\theta) = \mathbb E[(d_\theta(s,g) - (1 + d_\theta(s',g)))^2], \quad d_\theta(g,g)=0,$

typically parameterized as $d_\theta(s,g) = \| e_\theta(s) - e_\theta(g) \|_p$ with $e_\theta(\cdot)$ a learned embedding (Venkattaramanujam et al., 2019).

Density-adaptive and Riemannian Distances: Data-adaptive, density-weighted geodesic distances are defined from conformal Riemannian metrics:

$d_F(x, y) = \inf_{\gamma:x \rightarrow y} \int_0^1 p(\gamma(t))^{-\beta} \| \dot{\gamma}(t) \|\,dt,$

where $p(x)$ is a learned data density, typically estimated by normalizing flows, and $\beta$ is an inverse-temperature parameter; Riemannian geodesic paths are then refined using score-based ODEs (Sorrenson et al., 2024).

Flexible Parameterization: Adaptive metrics can be linear (e.g., Mahalanobis distance $d_M(x_i,x_j) = (x_i-x_j)^\top M (x_i-x_j)$ with $M \succeq 0$ learned) (Chakraborti et al., 2019, Song, 2019), nonlinear (via decision forests (Tomita et al., 2020)), or geometric (hyperbolic embeddings with pair-specific curvature and projection matrices (Li et al., 23 Jun 2025, Ma et al., 2021)).

2. Learning Algorithms and Optimization Strategies

Direct Optimization with SGD: In neural systems, adaptive distances are learned by end-to-end backpropagation, with gradients passing through both discriminative loss (cross-entropy, margin, or regression) and through distance parameters such as $\alpha_i$ , $\mu_i$ , $M$ (Oursland, 4 Feb 2025, Chakraborti et al., 2019, Song, 2019).
Self-supervised and Bellman Regression: In reinforcement learning, temporal difference (TD) errors provide a self-supervised signal, driving the distance function toward the expected action count. No explicit labels are needed beyond the reward structure; curriculum and off-policy data further increase diversity (Venkattaramanujam et al., 2019).
Meta-Learning and Online Adaptation: Some approaches dynamically update or meta-learn distance function parameters online. In iterative Approximate Bayesian Computation (ABC), the summary statistic weighting in the distance function is adaptively recalculated per iteration, reflecting changing variance as the parameter proposal shifts toward the posterior (Prangle, 2015).
Score-matching and Geometric Relaxation: For manifold-based or density-adaptive distances, score matching may be used to learn the gradient field of the log-density, enabling variational geodesic refinement beyond graph shortest-paths (Sorrenson et al., 2024, Lin et al., 2014).

3. Geometric and Statistical Frameworks

Prototype Geometry and Logical Structure: Distance-based neural models such as OffsetL2 induce quadratic (ellipsoidal) decision boundaries, corresponding to explicit proximity to class centers; this stands in contrast to intensity-based or hyperplane-based approaches that must encode similar regions implicitly and may suffer from combinatorial sub-optimality in high dimensions (Oursland, 4 Feb 2025).
Density-adaptive Geodesics: Data geometry is incorporated explicitly by making the local metric tensor depend inversely on the data density, causing shortest paths to avoid low-density regions and adhere to the data manifold (Sorrenson et al., 2024).
Adaptive Hyperbolic Geometry: In hierarchical data, adaptively learned hyperbolic metrics endow each pair with its own curvature and projection, allowing the metric to locally fit the semantic separation structure and providing sharper class or prototype boundaries (Ma et al., 2021, Li et al., 23 Jun 2025).
Semantics-aware Distance Fields: In manipulation and robot safety, task semantics are encoded via boundary conditions or potentials in a PDE-based distance function (e.g., Laplace equation in a Kelvin-inverted domain), adaptively reflecting regions where contact is allowed or forbidden (Muchacho et al., 2024).

4. Applications Across Domains

Classification and Recognition: OffsetL2 (MNIST: 97.33–97.61% accuracy) and DML-CRC architectures show consistently higher accuracy and robustness compared to fixed metric or intensity-based schemes, particularly in fine-grained or high-variance settings (Oursland, 4 Feb 2025, Chakraborti et al., 2019).
Goal-conditioned RL and Planning: Learned action-based distances enable goal-attainment even when state or goal-space geometry is unknown or discontinuous, substantially improving both sample efficiency and robustness to reward sparsity (Venkattaramanujam et al., 2019).
Robust Similarity and Kernel Learning: Decision-forest-based distance functions (SMERF) fit nonlinear, locally adaptive metrics, achieving high AUC in link prediction and precise recovery of underlying structures in simulated tasks (Tomita et al., 2020).
Anomaly and Out-of-distribution Detection: Siamese networks trained to discriminate pairs provide a flexible, transferable anomaly score; adaptation occurs in the exemplar set, not (necessarily) in the network weights (Ramachandra et al., 2020).
Robot Control and Collision Checking: Neural distance fields (N-CEDF) for continuum robots combine adaptation at the network and module level, enabling real-time, safe motion planning even with dynamic environments and low-latency constraints (Long et al., 2024).
LLM-generated Text Detection: Learn-to-Distance applies adaptive metric learning to maximize the discrimination between original and rewritten text, achieving up to 80.6% improvement in AUC over fixed-metric baselines on diverse LLMs (Zhou et al., 29 Jan 2026).

5. Robustness, Adaptivity, and Theoretical Guarantees

Adaptive Robustness: Adaptive distance methods demonstrate reduced sensitivity to noise and outliers. For instance, $p=1$ geodesic distances on graphs are provably more robust to spurious edges and structural noise than classical Dijkstra (shortest-path, $p=\infty$ ). This robustness can be further enhanced by learning node potentials or features as part of the ODE-driven computation (Azad et al., 2024).
Generalization Across Tasks: Adaptive distances naturally interpolate between classical methods. Variants of Adaptive Nearest Neighbor (ANN) recover LMNN, NCA, and pairwise metric learning as limiting cases (through hyperparameters), but achieve higher accuracy across diverse data because the metric search space is broader (Song, 2019).
Computational Efficiency: Low-rank parameterization, modularization (per-link in N-CEDF), and hard-pair mining strategically focus adaptation and computation, ensuring scalability to large data regimes or real-time applications while bounding errors (Long et al., 2024, Li et al., 23 Jun 2025).
Theoretical Underpinnings: Consistency and optimality guarantees are provided for several frameworks: random-forest-based distance learning is consistent under additive and conditional variance models (Tomita et al., 2020); meta-theorems transform non-adaptive estimators into adaptive-query-safe data structures with controlled overhead (Cherapanamjeri et al., 2020).

6. Open Challenges and Future Directions

Scalability and Memory: High-dimensional and combinatorial settings may challenge even low-rank approximations; extensions to meshless or basis-expansion methods are open research directions (Muchacho et al., 2024, Long et al., 2024).
Generalization Beyond Observed Domains: The extent to which learned distances generalize to out-of-distribution samples or unseen subspaces remains an active area. Adaptive threshold selection, uncertainty quantification, and dynamic update schemes are promising avenues (Venkattaramanujam et al., 2019, Li et al., 23 Jun 2025).
Integration with Downstream Tasks: Joint distance–representation and distance–policy optimization (e.g., merging distance to goal with value learning) hold promise for further dramatic gains in sample efficiency and compositionality (Venkattaramanujam et al., 2019).
Automated Selection and Interpretability: Feature-importance scoring and explicit interpretability remain better understood in tree/forest models; extending similar transparency without sacrificing adaptivity in deep neural and geometric models is a significant open question (Tomita et al., 2020, Zhou et al., 29 Jan 2026).