JEPA-SCORE: Self-Supervised Density Estimation

Updated 16 October 2025

JEPA-SCORE is a closed-form estimator that computes sample likelihoods using the Jacobian of a JEPA model.
It leverages anti-collapse regularization to infer data density, aligning computed scores with ground-truth log-densities across various datasets.
Its practical applications include data curation, outlier detection, and scalable density estimation without requiring explicit generative models.

JEPA-SCORE is a closed-form estimator of sample density derived from Joint Embedding Predictive Architectures (JEPAs), which are a class of self-supervised representation learning frameworks. While JEPAs have traditionally been employed for learning robust, general-purpose embeddings without explicit generative modeling, recent theoretical advances demonstrate that the anti-collapse regularization typical of JEPA implicitly learns the data distribution. JEPA-SCORE leverages the network’s Jacobian at an input sample to extract the probability (or “score”) that the learned representation assigns to that data point. This provides a tractable, model-agnostic means of density estimation, with applications to data curation, outlier detection, and nonparametric sample likelihood estimation, validated on a range of modalities and JEPA instances (Balestriero et al., 7 Oct 2025).

1. Theoretical Foundation and Motivation

The core principle of JEPA-SCORE derives from the dual objectives underlying JEPA training: (i) a latent-space prediction term, which enforces invariance by requiring the representation of a perturbed sample to be predictable from the unperturbed, and (ii) an anti-collapse term, typically realized by regularization mechanisms (e.g., forcing the empirical covariance of embeddings to match that of a standard Gaussian). The anti-collapse term not only prevents the trivial constant solution but also induces the learned embeddings f(X) to globally match a Gaussian (or hyperspherical uniform) distribution.

Given this setup, the change-of-variable formula of probability densities formalizes the relationship between the data distribution $p_X(x)$ and the induced embedding distribution $p_{f(X)}(f(x))$ , linking them through the Jacobian determinant of the learned mapping $f$ :

$p_{f(X)}(f(x)) = \int_{x : f(x) = f(x)} \frac{p_X(x)}{\prod_{k=1}^{\operatorname{rank}(J_f(x))} \sigma_k(J_f(x))} d\mathcal{H}^r(x)$

where $J_f(x)$ is the Jacobian, $\sigma_k$ are its singular values, and $\mathcal{H}^r$ is the Hausdorff measure over the level set of $f$ at $x$ .

JEPA-SCORE exploits this by inverting the relationship: when $f$ is trained so that $p_{f(X)}$ matches the target Gaussian density, the preimage volume (modulated by the Jacobian) yields a closed-form likelihood proxy for input samples.

2. Formal Definition and Practical Computation

Specialized to practical JEPA settings, the sample-wise density proxy (JEPA-SCORE) is computed as:

$\text{JEPA-SCORE}(x) \triangleq \sum_{k=1}^{\operatorname{rank}(J_f(x))} \log \sigma_k(J_f(x))$

where $\sigma_k(J_f(x))$ are the singular values of the Jacobian matrix of the encoder $f$ at input $x$ .

Implementation of JEPA-SCORE involves the following steps:

For any trained JEPA model (e.g., I-JEPA, DINOv2, MetaCLIP):
1. For an input $x$ , perform a forward pass to compute $f(x)$ .
2. Compute, via autograd, the Jacobian $J_f(x)$ .
3. Perform singular value decomposition (SVD) of $J_f(x)$ .
4. For numerical stability, singular values may be clipped below a small $\epsilon > 0$ (e.g., 1e-6).
5. Sum the logarithms of the singular values to yield the JEPA-SCORE.

This scalar can be interpreted (modulo constants) as the local log-determinant of the volume contraction from $x$ to $f(x)$ —hence, as a log-likelihood under the learned embedding-induced data density.

3. Empirical Findings and Cross-Model Validation

JEPA-SCORE has been empirically validated across synthetic and real data settings:

On Gaussian mixture and controlled synthetic datasets, JEPA-SCORE correlates with ground-truth log-densities: Langevin sampling according to JEPA-SCORE recovers the true density.
On Imagenet-1k, MNIST, and a Galaxy dataset, JEPA-SCORE computed with I-JEPA, DINOv2, and MetaCLIP correctly assigns higher scores to in-distribution samples and lower scores to out-of-distribution or undersampled cases. Ordering images by JEPA-SCORE within a class reveals semantic alignment with high or low-density features (e.g., flying birds vs. seated birds).
Across architectures and modalities, the connection between learned Gaussian embeddings and data density via JEPA-SCORE remains robust, confirming the method’s broad applicability.

4. Applications

JEPA-SCORE enables several downstream uses:

Data curation: By ranking samples according to JEPA-SCORE, practitioners can select representative (high-density) samples or sparsify overrepresented regions; conversely, samples with low scores may be selected for targeted augmentation or investigation.
Outlier detection: Samples with exceptionally low JEPA-SCORE are flagged as probable outliers or novelty cases, facilitating dataset cleaning or anomaly detection tasks.
Density estimation in high dimensions: JEPA-SCORE provides a scalable and efficient alternative to explicit generative modeling for likelihood estimation, without requiring density modeling in input space.
Model assessment and calibration: Distributions of JEPA-SCORE across datasets indicate coverage (or lack thereof) of the training distribution, providing a diagnostic for representation quality and generalization.

5. Relation to the JEPA Family and Interpretative Implications

The existence and utility of JEPA-SCORE is a direct consequence of the anti-collapse (diversity) term in JEPA objectives. This term, often regarded as a mere collapse-prevention heuristic, fundamentally forces the network to allocate embedding space in proportion to the empirical data density. As such, any successfully trained JEPA (encompassing I-JEPA, DINOv2, MetaCLIP, etc.) yields a representation from which input sample densities can be extracted, in a model-and-dataset-agnostic manner, using only the encoder’s local Jacobian.

Whereas generative models (VAEs, normalizing flows) model $p_X$ by explicit inversion or likelihood maximization, JEPAs accomplish similar density estimation in a latent, non-generative framework. JEPA-SCORE thus closes the gap between discriminative self-supervised learning and nonparametric density estimation.

6. Limitations and Future Research Directions

While JEPA-SCORE has demonstrated strong alignment with empirical densities in diverse scenarios, several practical and theoretical aspects merit further study:

Scalability of Jacobian computation for very high-dimensional input or particularly deep architectures.
Sensitivity to network architecture and regularization hyperparameters.
Extensions to more complex modalities (e.g., video, multimodal data) or architectures where level set structure is less trivial.
Integration with training procedures, e.g., for curriculum learning, balanced sampling, or zero-shot model diagnostics using density information.
Theoretical investigation into the sharpness of the learned density estimator and its relationship to sample likelihoods in non-unimodal or heavily skewed distributions.

7. Summary

JEPA-SCORE provides a theoretically grounded and empirically validated link between self-supervised embedding learning and data density estimation. By exploiting the relationship between JEPA’s anti-collapse constraints and the encoder’s Jacobian, it enables direct recovery of per-sample likelihoods without explicit generative modeling. Applications span data curation, outlier detection, density estimation, and model diagnostics, reflecting its utility as both a practical tool and a conceptual advance at the intersection of representation learning and statistical inference (Balestriero et al., 7 Oct 2025).

Markdown Report Issue Upgrade to Chat

References (1)

Gaussian Embeddings: How JEPAs Secretly Learn Your Data Density (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to JEPA-SCORE.

JEPA-SCORE: Self-Supervised Density Estimation

1. Theoretical Foundation and Motivation

2. Formal Definition and Practical Computation

3. Empirical Findings and Cross-Model Validation

4. Applications

5. Relation to the JEPA Family and Interpretative Implications

6. Limitations and Future Research Directions

7. Summary

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

JEPA-SCORE: Self-Supervised Density Estimation

1. Theoretical Foundation and Motivation

2. Formal Definition and Practical Computation

3. Empirical Findings and Cross-Model Validation

4. Applications

5. Relation to the JEPA Family and Interpretative Implications

6. Limitations and Future Research Directions

7. Summary

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research