JEPA-SCORE: Self-Supervised Density Estimation
- JEPA-SCORE is a closed-form estimator that computes sample likelihoods using the Jacobian of a JEPA model.
- It leverages anti-collapse regularization to infer data density, aligning computed scores with ground-truth log-densities across various datasets.
- Its practical applications include data curation, outlier detection, and scalable density estimation without requiring explicit generative models.
JEPA-SCORE is a closed-form estimator of sample density derived from Joint Embedding Predictive Architectures (JEPAs), which are a class of self-supervised representation learning frameworks. While JEPAs have traditionally been employed for learning robust, general-purpose embeddings without explicit generative modeling, recent theoretical advances demonstrate that the anti-collapse regularization typical of JEPA implicitly learns the data distribution. JEPA-SCORE leverages the network’s Jacobian at an input sample to extract the probability (or “score”) that the learned representation assigns to that data point. This provides a tractable, model-agnostic means of density estimation, with applications to data curation, outlier detection, and nonparametric sample likelihood estimation, validated on a range of modalities and JEPA instances (Balestriero et al., 7 Oct 2025).
1. Theoretical Foundation and Motivation
The core principle of JEPA-SCORE derives from the dual objectives underlying JEPA training: (i) a latent-space prediction term, which enforces invariance by requiring the representation of a perturbed sample to be predictable from the unperturbed, and (ii) an anti-collapse term, typically realized by regularization mechanisms (e.g., forcing the empirical covariance of embeddings to match that of a standard Gaussian). The anti-collapse term not only prevents the trivial constant solution but also induces the learned embeddings f(X) to globally match a Gaussian (or hyperspherical uniform) distribution.
Given this setup, the change-of-variable formula of probability densities formalizes the relationship between the data distribution and the induced embedding distribution , linking them through the Jacobian determinant of the learned mapping :
where is the Jacobian, are its singular values, and is the Hausdorff measure over the level set of at .
JEPA-SCORE exploits this by inverting the relationship: when is trained so that matches the target Gaussian density, the preimage volume (modulated by the Jacobian) yields a closed-form likelihood proxy for input samples.
2. Formal Definition and Practical Computation
Specialized to practical JEPA settings, the sample-wise density proxy (JEPA-SCORE) is computed as:
where are the singular values of the Jacobian matrix of the encoder at input .
Implementation of JEPA-SCORE involves the following steps:
- For any trained JEPA model (e.g., I-JEPA, DINOv2, MetaCLIP):
- For an input , perform a forward pass to compute .
- Compute, via autograd, the Jacobian .
- Perform singular value decomposition (SVD) of .
- For numerical stability, singular values may be clipped below a small (e.g., 1e-6).
- Sum the logarithms of the singular values to yield the JEPA-SCORE.
This scalar can be interpreted (modulo constants) as the local log-determinant of the volume contraction from to —hence, as a log-likelihood under the learned embedding-induced data density.
3. Empirical Findings and Cross-Model Validation
JEPA-SCORE has been empirically validated across synthetic and real data settings:
- On Gaussian mixture and controlled synthetic datasets, JEPA-SCORE correlates with ground-truth log-densities: Langevin sampling according to JEPA-SCORE recovers the true density.
- On Imagenet-1k, MNIST, and a Galaxy dataset, JEPA-SCORE computed with I-JEPA, DINOv2, and MetaCLIP correctly assigns higher scores to in-distribution samples and lower scores to out-of-distribution or undersampled cases. Ordering images by JEPA-SCORE within a class reveals semantic alignment with high or low-density features (e.g., flying birds vs. seated birds).
- Across architectures and modalities, the connection between learned Gaussian embeddings and data density via JEPA-SCORE remains robust, confirming the method’s broad applicability.
4. Applications
JEPA-SCORE enables several downstream uses:
- Data curation: By ranking samples according to JEPA-SCORE, practitioners can select representative (high-density) samples or sparsify overrepresented regions; conversely, samples with low scores may be selected for targeted augmentation or investigation.
- Outlier detection: Samples with exceptionally low JEPA-SCORE are flagged as probable outliers or novelty cases, facilitating dataset cleaning or anomaly detection tasks.
- Density estimation in high dimensions: JEPA-SCORE provides a scalable and efficient alternative to explicit generative modeling for likelihood estimation, without requiring density modeling in input space.
- Model assessment and calibration: Distributions of JEPA-SCORE across datasets indicate coverage (or lack thereof) of the training distribution, providing a diagnostic for representation quality and generalization.
5. Relation to the JEPA Family and Interpretative Implications
The existence and utility of JEPA-SCORE is a direct consequence of the anti-collapse (diversity) term in JEPA objectives. This term, often regarded as a mere collapse-prevention heuristic, fundamentally forces the network to allocate embedding space in proportion to the empirical data density. As such, any successfully trained JEPA (encompassing I-JEPA, DINOv2, MetaCLIP, etc.) yields a representation from which input sample densities can be extracted, in a model-and-dataset-agnostic manner, using only the encoder’s local Jacobian.
Whereas generative models (VAEs, normalizing flows) model by explicit inversion or likelihood maximization, JEPAs accomplish similar density estimation in a latent, non-generative framework. JEPA-SCORE thus closes the gap between discriminative self-supervised learning and nonparametric density estimation.
6. Limitations and Future Research Directions
While JEPA-SCORE has demonstrated strong alignment with empirical densities in diverse scenarios, several practical and theoretical aspects merit further study:
- Scalability of Jacobian computation for very high-dimensional input or particularly deep architectures.
- Sensitivity to network architecture and regularization hyperparameters.
- Extensions to more complex modalities (e.g., video, multimodal data) or architectures where level set structure is less trivial.
- Integration with training procedures, e.g., for curriculum learning, balanced sampling, or zero-shot model diagnostics using density information.
- Theoretical investigation into the sharpness of the learned density estimator and its relationship to sample likelihoods in non-unimodal or heavily skewed distributions.
7. Summary
JEPA-SCORE provides a theoretically grounded and empirically validated link between self-supervised embedding learning and data density estimation. By exploiting the relationship between JEPA’s anti-collapse constraints and the encoder’s Jacobian, it enables direct recovery of per-sample likelihoods without explicit generative modeling. Applications span data curation, outlier detection, density estimation, and model diagnostics, reflecting its utility as both a practical tool and a conceptual advance at the intersection of representation learning and statistical inference (Balestriero et al., 7 Oct 2025).