Trajectory Deviation Index in Neural Encoders
- Trajectory Deviation Index (TDI) is a diagnostic metric that quantifies the isotropic path-length distortion in neural representations by leveraging Gaussian perturbations and Jacobian sensitivity.
- It directly measures the unavoidable geometric blind spot in ERM-trained encoders and offers a uniform regularization framework across layers and architectures.
- TDI uniquely isolates average isotropic drift, distinguishing it from traditional metrics like the Jacobian Frobenius norm and aiding improvements in vision and language models.
The Trajectory Deviation Index (TDI) is a quantitative diagnostic introduced to measure the isotropic path-length distortion induced by supervised learning objectives in neural representation spaces. TDI directly quantifies the geometric blind spot of empirical risk minimization (ERM): the necessary nonzero Jacobian sensitivity in directions correlated with training labels but considered nuisance at test time. Unlike aggregate sensitivity or adversarial robustness metrics, TDI isolates and captures the precise geometric effect that is mathematically unavoidable in ERM-trained encoders, providing both a diagnosis and a means for uniform regularization across deep networks, vision, and LLMs (Rajput, 23 Apr 2026).
1. Formal Definition and Mathematical Formulation
Let be a (potentially multi-layer) encoder. For a random isotropic perturbation and input , TDI probes the encoder’s roughness by evaluating the expected normalized squared deviation through all intermediate layer outputs ( for ):
In the zero-noise limit, with , this reduces to a Jacobian-normalized isometry measure:
where is the Jacobian for the -th layer representation. At the final layer, this is the standard embedding drift. TDI therefore quantifies the isotropic path-length distortion along all directions—precisely the quantity ERM cannot minimize to zero due to its geometric blind spot (Rajput, 23 Apr 2026).
2. Theoretical Underpinnings: Theorem 1 and Proposition 5
Theorem 1 (Geometric Incompleteness) states that any ERM-trained encoder 0 in the Gaussian correlated-nuisance model must satisfy
1
where 2 quantifies label–nuisance correlation, 3 is the Lipschitz constant of the decoder, and 4 is a data-dependent constant. Thus, ERM enforces a geometric lower bound: no encoder learned by ERM is truly isometric along nuisance directions.
Proposition 5 demonstrates that, among all zero-mean perturbations, only isotropic Gaussian noise 5 minimizes the expected squared norm 6 uniformly over all directions. Thus, a Gaussian-noise-based penalty uniquely provides uniform suppression of encoder Jacobian norm.
Combining these, TDI emerges as the unique diagnostic for the bounded isotropic drift established by Theorem 1, and Gaussian perturbation regularization—as in PMH—provides the minimal mechanism for its control (Rajput, 23 Apr 2026).
3. Comparison to Alternative Geometric and Robustness Metrics
TDI is distinct from metrics such as the Jacobian Frobenius norm, centered kernel alignment (CKA), intrinsic dimension, or adversarial robustness measures. Table 1 summarizes typical empirical observations (CIFAR-10/ViT, Task 04):
| Metric | ERM | VAT | PGD-4/255 | PMH |
|---|---|---|---|---|
| CKA (vs ERM) | — | 0.91 | — | 0.88 |
| Intr. dim. | 42.3 | 44.1 | — | 38.7 |
| Jac Fro | 34.58 | 5.01 | 2.91 | 8.08 |
| TDI@0 | 1.093 | 1.276 | 1.336 | 0.904 |
CKA and dimensionality shift little between methods. The Jacobian Frobenius norm ranks PGD noise as most effective, yet TDI uniquely records that PGD adversarial training actually worsens (raises) clean-input isotropic geometry beyond ERM, while PMH regularization yields the smoothest representations. Thus, adversarial metrics—focused on worst-case loss or maximal local sensitivity—fail to capture the average isotropic drift, while TDI isolates this direction-agnostic geometric fragility (Rajput, 23 Apr 2026).
4. Practical Computation and Estimation
TDI@0 can be estimated efficiently. For a batch 7 and layers 8, with a small evaluation noise 9 and 0 perturbation samples per input (typically 1 if 2 is large):
- Initialize accumulators for each 3.
- For each 4, sample 5.
- Compute 6.
- For each 7, evaluate 8 and accumulate 9.
- Accumulate the square norm of 0.
- For each 1, compute the mean ratio of perturbed to original norms.
- Average over layers for the final TDI estimate.
For 2, Taylor errors are negligible (3) (Rajput, 23 Apr 2026).
5. Empirical Findings and Diagnostic Power
Measured TDI@0 (clean-input, 4) values span vision and language settings:
| Setting | ERM | PGD-4/255 | PMH |
|---|---|---|---|
| Task04 (CIFAR ViT) | 1.093 | 1.336 | 0.904 |
| Task01 (CIFAR ResNet) | 1.074 | 1.336 | 0.904 |
| BERT SST-2 (Pert-B) | 0.496 | — | 0.354 |
| ViT-B/16 ImageNet | 1.230 | — | 0.936 |
PMH reduces TDI by up to 28.7% on BERT/SST-2 and 23.9% on ImageNet ViT. The blind-spot ratio (TDI_parallel / TDI_signal) falls monotonically across LLM scales: 0.860 (66M), 0.765 (110M), 0.742 (340M) under ERM. Task-specific ERM fine-tuning can increase TDI by 54%, whereas PMH fine-tuning can reduce it by 11x (Rajput, 23 Apr 2026).
6. TDI as a Regularizer: PMH Approach
Proposition 5 leads to the PMH objective, combining the primary task loss 5 and a Gaussian-noise matching penalty:
6
The overall fine-tuning objective:
7
where 8 is a warm-up ramp and 9 is capped to maintain a fixed proportion of 0 to the total loss. This single regularization term, with no contrastive or decoder requirement, suffices to drive TDI toward its theoretical minimum, fully repairing the geometric blind spot across tasks and architectures (Rajput, 23 Apr 2026).
7. Implementation Guidance and Recommended Usage
Empirical guidance for applying TDI and PMH regularization includes:
- Estimating 1: Choose the largest 2 that preserves clean accuracy, avoiding catastrophic under-suppression due to strong asymmetry.
- T-alignment: Always match evaluation TDI noise scale to training 3. TDI versus 4 curves peak on the diagonal.
- Multi-scale PMH: If deployment 5 is unknown, cycling 6 over a range uniformly penalizes the Frobenius norm. In practice, a single large 7 achieves 8 of the multi-scale effect.
- Subspace diagnostics: The dominant nuisance subspace can be estimated as 9 for further geometric insight.
- Layer-wise TDI: TDI’s per-layer version can guide targeted regularization.
- Comprehensive diagnostics: Combine TDI with CKA, intrinsic dimension, and Jacobian Frobenius for a multi-faceted geometric evaluation. Notably, TDI alone detects PGD's anisotropic patching failure mode (Rajput, 23 Apr 2026).
TDI thus serves both as a foundational diagnostic for the geometric limits of ERM and as the guiding metric for a minimal, mathematically principled regularizer. It is applicable across supervised, adversarial, and foundation-model pretraining regimes.