Unsupervised ID-based Algorithms for Comparing Models, Task Complexity, and Generated Text

Develop an unsupervised algorithm that leverages intrinsic dimension (ID) estimates of hidden representations in transformer-based large language models to perform (i) model comparison, (ii) task complexity comparison across datasets, and (iii) generated text comparison, operationalizing ID as the core metric for these comparative analyses.

Background

The paper investigates the geometry of decision-making in transformer-based LLMs by estimating intrinsic dimension (ID) across layers and tasks. The authors find consistent hump-shaped ID trends aligned with the emergence of decisiveness and propose that ID can serve as a proxy for representational focus and task commitment.

While they observed some negative correlations between final-layer ID and accuracy within specific model families, these results were not universally robust across all models and datasets. Motivated by these findings, the authors explicitly note that turning ID into a concrete, unsupervised algorithm for comparative purposes—across models, tasks, and generated text—remains unresolved and is a direction for future work.

References

  1. Though our work highlights the ID estimates showing a strong relation with model generalization, exploiting them to develop a concrete unsupervised algorithm for model comparison/task complexity comparison and generated text comparison remains open for future avenues.
Geometry of Decision Making in Language Models (2511.20315 - Joshi et al., 25 Nov 2025) in Appendix, Section "Additional Results, Discussion and Future Directions" — Future Directions, Item 6