Papers
Topics
Authors
Recent
Search
2000 character limit reached

Trajectory Deviation Index in Neural Encoders

Updated 1 May 2026
  • Trajectory Deviation Index (TDI) is a diagnostic metric that quantifies the isotropic path-length distortion in neural representations by leveraging Gaussian perturbations and Jacobian sensitivity.
  • It directly measures the unavoidable geometric blind spot in ERM-trained encoders and offers a uniform regularization framework across layers and architectures.
  • TDI uniquely isolates average isotropic drift, distinguishing it from traditional metrics like the Jacobian Frobenius norm and aiding improvements in vision and language models.

The Trajectory Deviation Index (TDI) is a quantitative diagnostic introduced to measure the isotropic path-length distortion induced by supervised learning objectives in neural representation spaces. TDI directly quantifies the geometric blind spot of empirical risk minimization (ERM): the necessary nonzero Jacobian sensitivity in directions correlated with training labels but considered nuisance at test time. Unlike aggregate sensitivity or adversarial robustness metrics, TDI isolates and captures the precise geometric effect that is mathematically unavoidable in ERM-trained encoders, providing both a diagnosis and a means for uniform regularization across deep networks, vision, and LLMs (Rajput, 23 Apr 2026).

1. Formal Definition and Mathematical Formulation

Let ϕ:RdRm\phi: \mathbb{R}^d \to \mathbb{R}^m be a (potentially multi-layer) encoder. For a random isotropic perturbation δN(0,σ2Id)\delta \sim \mathcal{N}(0,\sigma^2 I_d) and input xPx \sim P, TDI probes the encoder’s roughness by evaluating the expected normalized squared deviation through all intermediate layer outputs (ϕ(1:)(x)\phi^{(1:\ell)}(x) for =1L\ell=1\ldots L):

TDI(ϕ,σ):=1L=1LEx,δϕ(1:)(x+δ)ϕ(1:)(x)2Exϕ(1:)(x)2\mathrm{TDI}(\phi, \sigma) := \frac{1}{L} \sum_{\ell=1}^L \frac{ \mathbb{E}_{x,\delta}\left\| \phi^{(1:\ell)}(x+\delta) - \phi^{(1:\ell)}(x) \right\|^2 }{ \mathbb{E}_x \left\| \phi^{(1:\ell)}(x) \right\|^2 }

In the zero-noise limit, with σ0\sigma \to 0, this reduces to a Jacobian-normalized isometry measure:

TDI(ϕ,0):=limσ0+TDI(ϕ,σ)σ2L=1LExJϕ(1:)(x)F2Exϕ(1:)(x)2\mathrm{TDI}(\phi, 0) := \lim_{\sigma \to 0^+} \mathrm{TDI}(\phi, \sigma) \approx \frac{\sigma^2}{L} \sum_{\ell=1}^L \frac{ \mathbb{E}_x \| J_{\phi^{(1:\ell)}}(x) \|_F^2 }{ \mathbb{E}_x \| \phi^{(1:\ell)}(x) \|^2 }

where Jϕ(1:)(x)J_{\phi^{(1:\ell)}}(x) is the Jacobian for the \ell-th layer representation. At the final layer, this is the standard embedding drift. TDI therefore quantifies the isotropic path-length distortion along all directions—precisely the quantity ERM cannot minimize to zero due to its geometric blind spot (Rajput, 23 Apr 2026).

2. Theoretical Underpinnings: Theorem 1 and Proposition 5

Theorem 1 (Geometric Incompleteness) states that any ERM-trained encoder δN(0,σ2Id)\delta \sim \mathcal{N}(0,\sigma^2 I_d)0 in the Gaussian correlated-nuisance model must satisfy

δN(0,σ2Id)\delta \sim \mathcal{N}(0,\sigma^2 I_d)1

where δN(0,σ2Id)\delta \sim \mathcal{N}(0,\sigma^2 I_d)2 quantifies label–nuisance correlation, δN(0,σ2Id)\delta \sim \mathcal{N}(0,\sigma^2 I_d)3 is the Lipschitz constant of the decoder, and δN(0,σ2Id)\delta \sim \mathcal{N}(0,\sigma^2 I_d)4 is a data-dependent constant. Thus, ERM enforces a geometric lower bound: no encoder learned by ERM is truly isometric along nuisance directions.

Proposition 5 demonstrates that, among all zero-mean perturbations, only isotropic Gaussian noise δN(0,σ2Id)\delta \sim \mathcal{N}(0,\sigma^2 I_d)5 minimizes the expected squared norm δN(0,σ2Id)\delta \sim \mathcal{N}(0,\sigma^2 I_d)6 uniformly over all directions. Thus, a Gaussian-noise-based penalty uniquely provides uniform suppression of encoder Jacobian norm.

Combining these, TDI emerges as the unique diagnostic for the bounded isotropic drift established by Theorem 1, and Gaussian perturbation regularization—as in PMH—provides the minimal mechanism for its control (Rajput, 23 Apr 2026).

3. Comparison to Alternative Geometric and Robustness Metrics

TDI is distinct from metrics such as the Jacobian Frobenius norm, centered kernel alignment (CKA), intrinsic dimension, or adversarial robustness measures. Table 1 summarizes typical empirical observations (CIFAR-10/ViT, Task 04):

Metric ERM VAT PGD-4/255 PMH
CKA (vs ERM) 0.91 0.88
Intr. dim. 42.3 44.1 38.7
Jac Fro 34.58 5.01 2.91 8.08
TDI@0 1.093 1.276 1.336 0.904

CKA and dimensionality shift little between methods. The Jacobian Frobenius norm ranks PGD noise as most effective, yet TDI uniquely records that PGD adversarial training actually worsens (raises) clean-input isotropic geometry beyond ERM, while PMH regularization yields the smoothest representations. Thus, adversarial metrics—focused on worst-case loss or maximal local sensitivity—fail to capture the average isotropic drift, while TDI isolates this direction-agnostic geometric fragility (Rajput, 23 Apr 2026).

4. Practical Computation and Estimation

TDI@0 can be estimated efficiently. For a batch δN(0,σ2Id)\delta \sim \mathcal{N}(0,\sigma^2 I_d)7 and layers δN(0,σ2Id)\delta \sim \mathcal{N}(0,\sigma^2 I_d)8, with a small evaluation noise δN(0,σ2Id)\delta \sim \mathcal{N}(0,\sigma^2 I_d)9 and xPx \sim P0 perturbation samples per input (typically xPx \sim P1 if xPx \sim P2 is large):

  1. Initialize accumulators for each xPx \sim P3.
  2. For each xPx \sim P4, sample xPx \sim P5.
    • Compute xPx \sim P6.
    • For each xPx \sim P7, evaluate xPx \sim P8 and accumulate xPx \sim P9.
    • Accumulate the square norm of ϕ(1:)(x)\phi^{(1:\ell)}(x)0.
  3. For each ϕ(1:)(x)\phi^{(1:\ell)}(x)1, compute the mean ratio of perturbed to original norms.
  4. Average over layers for the final TDI estimate.

For ϕ(1:)(x)\phi^{(1:\ell)}(x)2, Taylor errors are negligible (ϕ(1:)(x)\phi^{(1:\ell)}(x)3) (Rajput, 23 Apr 2026).

5. Empirical Findings and Diagnostic Power

Measured TDI@0 (clean-input, ϕ(1:)(x)\phi^{(1:\ell)}(x)4) values span vision and language settings:

Setting ERM PGD-4/255 PMH
Task04 (CIFAR ViT) 1.093 1.336 0.904
Task01 (CIFAR ResNet) 1.074 1.336 0.904
BERT SST-2 (Pert-B) 0.496 0.354
ViT-B/16 ImageNet 1.230 0.936

PMH reduces TDI by up to 28.7% on BERT/SST-2 and 23.9% on ImageNet ViT. The blind-spot ratio (TDI_parallel / TDI_signal) falls monotonically across LLM scales: 0.860 (66M), 0.765 (110M), 0.742 (340M) under ERM. Task-specific ERM fine-tuning can increase TDI by 54%, whereas PMH fine-tuning can reduce it by 11x (Rajput, 23 Apr 2026).

6. TDI as a Regularizer: PMH Approach

Proposition 5 leads to the PMH objective, combining the primary task loss ϕ(1:)(x)\phi^{(1:\ell)}(x)5 and a Gaussian-noise matching penalty:

ϕ(1:)(x)\phi^{(1:\ell)}(x)6

The overall fine-tuning objective:

ϕ(1:)(x)\phi^{(1:\ell)}(x)7

where ϕ(1:)(x)\phi^{(1:\ell)}(x)8 is a warm-up ramp and ϕ(1:)(x)\phi^{(1:\ell)}(x)9 is capped to maintain a fixed proportion of =1L\ell=1\ldots L0 to the total loss. This single regularization term, with no contrastive or decoder requirement, suffices to drive TDI toward its theoretical minimum, fully repairing the geometric blind spot across tasks and architectures (Rajput, 23 Apr 2026).

Empirical guidance for applying TDI and PMH regularization includes:

  • Estimating =1L\ell=1\ldots L1: Choose the largest =1L\ell=1\ldots L2 that preserves clean accuracy, avoiding catastrophic under-suppression due to strong asymmetry.
  • T-alignment: Always match evaluation TDI noise scale to training =1L\ell=1\ldots L3. TDI versus =1L\ell=1\ldots L4 curves peak on the diagonal.
  • Multi-scale PMH: If deployment =1L\ell=1\ldots L5 is unknown, cycling =1L\ell=1\ldots L6 over a range uniformly penalizes the Frobenius norm. In practice, a single large =1L\ell=1\ldots L7 achieves =1L\ell=1\ldots L8 of the multi-scale effect.
  • Subspace diagnostics: The dominant nuisance subspace can be estimated as =1L\ell=1\ldots L9 for further geometric insight.
  • Layer-wise TDI: TDI’s per-layer version can guide targeted regularization.
  • Comprehensive diagnostics: Combine TDI with CKA, intrinsic dimension, and Jacobian Frobenius for a multi-faceted geometric evaluation. Notably, TDI alone detects PGD's anisotropic patching failure mode (Rajput, 23 Apr 2026).

TDI thus serves both as a foundational diagnostic for the geometric limits of ERM and as the guiding metric for a minimal, mathematically principled regularizer. It is applicable across supervised, adversarial, and foundation-model pretraining regimes.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Trajectory Deviation Index (TDI).