Papers
Topics
Authors
Recent
2000 character limit reached

IDMap Framework: Feature-Aligned Diffusion Maps

Updated 16 November 2025
  • IDMap is a geometric data analysis method that iteratively refines embeddings by deforming the data manifold with locally adaptive anisotropic kernels.
  • It employs local linear regression to estimate Jacobians and tangent spaces, aligning the Riemannian metric with directions where the feature map varies most.
  • Empirical results show that IDMap robustly outperforms isotropic diffusion maps, effectively collapsing irrelevant dimensions in noisy, high-dimensional data.

The Iterated Diffusion Map (IDMap) framework is a geometric data analysis method designed to identify, extract, and emphasize features of interest in high-dimensional data lying on manifolds. By iteratively deforming the intrinsic geometry of the data through adaptive, anisotropic kernels guided by the feature map’s local Jacobian, IDMap produces low-dimensional embeddings that reflect target features while removing irrelevant directions. This approach generalizes classical diffusion maps by leveraging local covariance structures that align manifold geometry to the feature map, and provides rigorous tools for tangent space estimation, intrinsic dimension selection, and robust manifold learning, especially in settings involving product manifolds or degenerate mappings.

1. Anisotropic Local Kernel Construction

Let MRm\mathcal{M} \subset \mathbb{R}^m be a dd-dimensional manifold and H:MNRn\mathcal{H}: \mathcal{M} \to \mathcal{N} \subset \mathbb{R}^n a feature map. The central innovation of IDMap is the use of local kernels:

Kϵ(x,y)=exp(12(xy)C(x)1(xy)/ϵ)K_\epsilon(x, y) = \exp\left(-\tfrac{1}{2} (x-y)^\top C(x)^{-1} (x-y) / \epsilon\right)

with data-dependent, anisotropic covariance C(x)Rm×mC(x) \in \mathbb{R}^{m \times m}. The local geometry is modulated in the directions along which H\mathcal{H} varies most strongly. The covariance is chosen to satisfy:

c(x)1=I(x)C(x)1I(x)=DH(x)DH(x)c(x)^{-1} = \mathcal{I}(x) C(x)^{-1} \mathcal{I}(x)^\top = D\mathcal{H}(x)^\top D\mathcal{H}(x)

where I(x)\mathcal{I}(x) projects onto the tangent space TxMT_x \mathcal{M}, and DH(x)D\mathcal{H}(x) is the Jacobian of H\mathcal{H}. For computational stability, a convex combination is used:

CH(0)(x)=[(1τ)Im+τDH(x)DH(x)]1,0<τ1C_{\mathcal{H}^{(0)}}(x) = [(1-\tau)I_m + \tau\, D\mathcal{H}(x)^\top D\mathcal{H}(x)]^{-1},\quad 0<\tau \ll 1

These kernels induce, in the continuum limit, a Riemannian metric on M\mathcal{M}:

g~(u,v)=gM(c(x)1/2u, c(x)1/2v)\tilde{g}(u, v) = g_\mathcal{M}(c(x)^{-1/2}u,\ c(x)^{-1/2}v)

such that diffusion distances and resulting embeddings become increasingly sensitive to the feature directions of H\mathcal{H}.

2. Estimation of Local Jacobian and Tangent Spaces

Given sample data {xi}\{x_i\} and observed features {yi=H(xi)}\{y_i = \mathcal{H}(x_i)\}, IDMap estimates DH(x)D\mathcal{H}(x) at each xx using local linear regression. For each xx, its kk nearest neighbors xI(j)x_{I(j)} are selected, and weighted centered differences are constructed:

dxj=exp[xI(j)x2/(4ϵ)](xI(j)x)dx_j = \exp\left[-\|x_{I(j)}-x\|^2/(4\epsilon)\right](x_{I(j)}-x)

dyj=exp[xI(j)x2/(4ϵ)](yI(j)y)dy_j = \exp\left[-\|x_{I(j)}-x\|^2/(4\epsilon)\right](y_{I(j)}-y)

These are assembled into matrices XRm×kX \in \mathbb{R}^{m \times k} and YRn×kY \in \mathbb{R}^{n \times k}. The regression

DH^(x)=YX(XX)1\widehat{D\mathcal{H}}(x) = Y X^\top (X X^\top)^{-1}

yields a first-order estimate of the projected Jacobian with O(ϵ)O(\epsilon) error under regularity conditions. The SVD of XX at each point provides information about the local tangent space and rates of change associated with H\mathcal{H}.

3. Dimension Selection and Bandwidth Scaling Laws

Analysis of the SVD of XX over a neighborhood provides both intrinsic local dimension and optimal bandwidth selection. For dd tangent directions, singular values scale as σϵ\sigma \sim \sqrt{\epsilon}, while the remaining scale as O(ϵ)O(\epsilon). This scaling is quantified by exponents:

α=dlnσdlnϵ,α12 in tangent directions.\alpha_\ell = \frac{d \ln \sigma_\ell}{d \ln \epsilon}, \qquad \alpha \approx \frac{1}{2} \text{ in tangent directions.}

Aggregated scaling exponents give determinant-based dimension estimates, e.g.,

d2(ϵ)=2=1d1α+2(d1d1)αd1+1d_2(\epsilon) = 2\sum_{\ell=1}^{\lfloor d_1\rfloor} \alpha_\ell + 2 (d_1-\lfloor d_1\rfloor)\, \alpha_{\lfloor d_1\rfloor+1}

Bandwidth ϵ\epsilon is selected locally by maximizing d1(ϵ)d_1(\epsilon) (density-based) or minimizing a consistency criterion:

M(ϵ)=d1d2(d1+d2)/2+dlnd1dlnϵ+dlnd2dlnϵM(\epsilon) = \left| \frac{d_1 - d_2}{(d_1 + d_2)/2} \right| + \left| \frac{d\ln d_1}{d\ln\epsilon} \right| + \left| \frac{d\ln d_2}{d\ln\epsilon} \right|

This provides a principled approach to local geometry adaptation and regularization.

4. Iterated Diffusion Map Algorithm and Geometric Flow Interpretation

The core IDMap procedure iteratively reshapes data geometry to emphasize the designated feature:

  1. At each iteration \ell:
    • For each data point, estimate DH()(x)D\mathcal{H}^{(\ell)}(x).
    • Construct an adaptive kernel using updated CH()(x)C_{\mathcal{H}^{(\ell)}}(x).
    • Build the sparse kernel matrix, normalize to a graph Laplacian.
    • Compute the top MM diffusion map coordinates.
    • Rescale embeddings:

xi(+1)=(2π)d/4(4s)d/4+1/2(eslnξ1φ1(xi()),,eslnξMφM(xi()))x_i^{(\ell+1)} = (2\pi)^{d/4}(4s)^{d/4+1/2} \left(e^{s \ln \xi_1}\varphi_1(x_i^{(\ell)}),\ldots, e^{s \ln \xi_M}\varphi_M(x_i^{(\ell)})\right)^\top

  1. Repeat for TT iterations.

The process induces a discrete approximation to a geometric flow:

g˙=g+12(DHDHg+gDHDH)\dot{g} = -g + \tfrac{1}{2}\left(D\mathcal{H}^\top D\mathcal{H}\,g + g\,D\mathcal{H}^\top D\mathcal{H}\right)

which collapses directions orthogonal to kerDH\ker D\mathcal{H}, isolating the submanifold that encodes the feature of interest.

Pseudocode Summary:

1
2
3
4
5
6
7
8
9
10
11
Input: {x_i}, {y_i = H(x_i)}, T, τ
Initialize x_i^(0) = x_i
For ℓ = 0,...,T-1:
    For each x_i^(ℓ):
        Find k neighbors, estimate d_i, q_i, DĤ_i
        Build C_i = [(1−τ) I + τ DĤ_i^T DĤ_i]^{-1}
    Build kernel K_ij = exp[−(x_j−x_i)^T C_i^{-1}(x_j−x_i)/(2ε)]
    Normalize to graph Laplacian
    Compute top M eigenpairs (ξ_r,φ_r)
    Set x_i^(ℓ+1) = diffusion map embedding using (ξ_r,φ_r)
Output: Ψ(x_i) = x_i^(T)

5. Convergence Properties and Theoretical Guarantees

Suppose M=N×P\mathcal{M} = \mathcal{N} \times \mathcal{P}, a product manifold where H(y,z)=y\mathcal{H}(y, z) = y. Locally, DH=diag(IdN,0)D\mathcal{H} = \text{diag}(I_{d_\mathcal{N}}, 0). The geometric flow satisfies:

g˙N=0,g˙P=gP\dot{g}_\mathcal{N} = 0, \quad \dot{g}_\mathcal{P} = -g_\mathcal{P}

As tt \to \infty, the irrelevant factor P\mathcal{P} is collapsed, and IDMap produces an embedding isometric to N\mathcal{N}, the feature manifold. In settings where features are not exact quotients, IDMap empirically yields a lower-dimensional embedding where neighborhoods track the feature geometry. Theoretical results demonstrate convergence to the quotient manifold for product structures and near-isometric recoveries up to linear alignment of coordinates.

6. Empirical Results and Robustness

Empirical evaluation spans several canonical manifolds:

  • Annulus (r,θr, \theta): Target feature = radius rr. After T=4T=4 iterations (τ=0.65, M=250, k=500), the angular component is suppressed and the embedding recovers rr. Nearest neighbor tests show convergence of feature neighborhoods.
  • Torus (T2\mathbb{T}^2): Target feature = either circle factor. The irrelevant circle is collapsed (τ=0.4 or 0.65), yielding 1D embeddings aligned to the chosen circle, again after 4 iterations.
  • Sphere (S2S^2): Target feature = xx-coordinate or more complex functions. IDMap (τ ≈ 0.6–0.7) contracts the sphere to a 1D embedding aligned with the feature.
  • Performance metrics include neighborhood-recovery accuracy (fraction of true feature neighbors captured among kk nearest neighbors), embedding distortion, and SVD scree plots.
  • Robustness: The method is tolerant to moderate ambient Gaussian noise and high curvature. Satisfactory operation depends on proper tuning of local bandwidth ϵ\epsilon and pseudotime τ\tau.

In all cases, IDMap outperforms isotropic diffusion maps, which fail to suppress irrelevant factors, thus demonstrating the claim that local anisotropic adaptation aligned with the feature is essential for targeted manifold learning.

Table: Key Algorithmic Elements of IDMap

Component Mathematical Expression / Procedure Purpose
Anisotropic Kernel Kϵ(x,y)K_\epsilon(x, y) as above Induces feature-aligned metric
Jacobian Estimation DH^(x)\widehat{D\mathcal{H}}(x) via weighted regression Adapts local geometry
Iterative Update xi(+1)=x_i^{(\ell+1)}= rescaled diffusion coordinates Geometric flow toward feature
Dimension Selection SVD scaling laws, maximizing d1(ϵ)d_1(\epsilon) or minimizing M(ϵ)M(\epsilon) Adapt bandwidth, select dd

IDMap is situated in the lineage of spectral manifold learning methods, extending classical diffusion maps [Coifman & Lafon] by adaptively deforming geometry using supervised feature information. It unifies concepts from kernelized Laplacian approximation, data-driven Riemannian metrics, and geometric flows, and builds on precedents such as local kernels [LK], vector diffusion maps [Singer & Wu], and neighborhood regression.

Key implications:

  • IDMap enables feature-focused embedding and manifold quotienting where only a subset of variables are relevant.
  • The local scaling and bandwidth adaptation procedures provide robust practical tools for high-curvature, noisy, or high-codimension datasets.
  • A plausible implication is that this methodology can be extended to unsupervised feature selection by iteratively aligning kernels to data-driven discriminants.

Potential limitations include sensitivity to proper kernel bandwidth and the elevation of noise if feature signals are weak compared to irrelevant coordinates. Further paper of convergence rates, global alignment, and scaling in massive data regimes remains an open direction.

Slide Deck Streamline Icon: https://streamlinehq.com

Whiteboard

Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to IDMap Framework.