When Does Closeness in Distribution Imply Representational Similarity? An Identifiability Perspective (2506.03784v1)

Published 4 Jun 2025 in cs.LG, cs.AI, and stat.ML

Abstract: When and why representations learned by different deep neural networks are similar is an active research topic. We choose to address these questions from the perspective of identifiability theory, which suggests that a measure of representational similarity should be invariant to transformations that leave the model distribution unchanged. Focusing on a model family which includes several popular pre-training approaches, e.g., autoregressive LLMs, we explore when models which generate distributions that are close have similar representations. We prove that a small Kullback-Leibler divergence between the model distributions does not guarantee that the corresponding representations are similar. This has the important corollary that models arbitrarily close to maximizing the likelihood can still learn dissimilar representations, a phenomenon mirrored in our empirical observations on models trained on CIFAR-10. We then define a distributional distance for which closeness implies representational similarity, and in synthetic experiments, we find that wider networks learn distributions which are closer with respect to our distance and have more similar representations. Our results establish a link between closeness in distribution and representational similarity.

Summary

The paper demonstrates that small KL divergence does not guarantee similar internal representations even with near-optimal likelihood scores.
The work introduces a novel log-likelihood variance (LLV) distance that links distributional closeness with representational similarity through SVD-based bounds.
Experimental results reveal that wider neural networks attain representations closer to the identifiability class, affirming the proposed theoretical framework.

This paper investigates the relationship between the closeness of probability distributions generated by deep neural networks and the similarity of their learned internal representations. It approaches this problem from the perspective of identifiability theory, which suggests that a meaningful measure of representational similarity should be invariant to transformations that do not alter the model's output distribution.

The work focuses on a class of models widely used in tasks like autoregressive language modeling, supervised classification, and contrastive predictive coding. This model class defines a conditional distribution $p(y|x) \propto \exp(f(x)^T g(y))$ , where $f(x)$ is an embedding function and $g(y)$ is an unembedding function, both mapping to a shared representation space. A known identifiability result for this class states that, under a "diversity" condition (embeddings/unembeddings spanning the representation space), two models generate identical conditional distributions if and only if their embeddings and unembeddings are related by a specific type of invertible linear transformation ( $f(x) = A f'(x)$ and $g_0(y) = A^{-\top} g'_0(y)$ for displaced versions $g_0, g'_0$ ). This linear equivalence defines an "identifiability class" for representations.

The central question explored is whether models whose output distributions are close (not necessarily identical) also have similar representations, meaning they are close to being in the same identifiability class.

The paper first demonstrates that using the standard Kullback--Leibler (KL) divergence as a measure of distributional closeness is insufficient to guarantee representational similarity. Theorem 3 shows that two models can have an arbitrarily small KL divergence between their conditional distributions while their embeddings remain far from being linearly equivalent. A key corollary (Corollary 1) extends this, showing that models achieving arbitrarily small negative log-likelihood loss on a dataset can still learn vastly dissimilar representations. This highlights that achieving similar performance or likelihood does not automatically imply similar internal structure. The authors empirically observe this phenomenon in models trained on CIFAR-10, where models with similar test loss exhibit permutations in their learned class embeddings.

To address the limitations of KL divergence, the paper defines a new distributional distance called the log-likelihood variance (LLV) distance ( $d^\lambda_{LLV}$ ). This distance is based on the variance of weighted differences in log-likelihoods, specifically focusing on terms related to the difference between the log-probability of a label and the log-probability of a pivot label, normalized by their variance across inputs. Concurrently, the paper defines a representational dissimilarity measure ( $d_{SVD}$ ) based on the singular values of the cross-covariance matrix of appropriately transformed (scaled and centered) representations (related to PLS-SVD).

The main theoretical contribution (Theorem 4) proves that closeness in the $d^\lambda_{LLV}$ distance does imply representational similarity as measured by $d_{SVD}$ . Specifically, it provides an upper bound on the maximum of the $d_{SVD}$ between linearly transformed embeddings ( $L^\top f(x)$ and $L'^\top f'(x)$ ) and linearly transformed unembeddings ( $N^\top g(y)$ and $N'^\top g'(y)$ ), showing this bound is proportional to the $d^\lambda_{LLV}$ distance between the model distributions. This establishes a concrete link between distributional closeness and representational similarity under the proposed metrics.

Experimental results support these theoretical findings. On synthetic data, wider neural networks are shown to learn distributions that are closer according to $d^\lambda_{LLV}$ and also exhibit greater representational similarity ( $d_{SVD}$ is lower). This suggests that increased model capacity may encourage the learning of representations closer to the identifiability class, promoting similarity across different training runs or architectures. The paper also empirically validates the bound from Theorem 4 on both constructed and synthetic models, showing that the representational dissimilarity remains below the predicted value based on the distributional distance.

The paper acknowledges limitations, including the non-tightness of the derived bound, the reliance on a diversity assumption that might not hold for all models, and the focus on the final embedding/unembedding layers. It contrasts its $d_{SVD}$ measure with CCA-based methods, noting that its measure's invariances are tied specifically to the identifiability class of the model family considered. The findings emphasize that robustness of identifiability results in practical, finite-data settings where likelihood is only approximately maximized depends critically on the choice of distributional and representational distances. Understanding and ensuring this robustness is highlighted as a crucial direction for future research in representation learning.