Papers
Topics
Authors
Recent
2000 character limit reached

LeJEPA Epps-Pulley Regularizer

Updated 26 December 2025
  • The LeJEPA Epps-Pulley Regularizer is a distribution-matching technique that leverages the classical Epps–Pulley normality test to enforce isotropic Gaussian embeddings.
  • It employs a sliced Maximum Mean Discrepancy (MMD) approach with a Gaussian prior and Kummer kernel, ensuring robust, hyperparameter-free optimization.
  • Empirical benchmarks demonstrate its effectiveness, achieving consistent accuracy improvements across diverse architectures with minimal tuning.

The LeJEPA Epps-Pulley Regularizer is a distribution-matching regularization technique central to recent advances in self-supervised learning within the Joint-Embedding Predictive Architecture (JEPA) paradigm. Rooted in the statistical Epps–Pulley normality test and recast as a sliced maximum mean discrepancy (MMD) with a Gaussian prior and kernel, the LeJEPA Epps-Pulley Regularizer provably enforces isotropic Gaussian embeddings and provides scalable, hyperparameter-robust, and heuristics-free optimization for representation learning (Balestriero et al., 11 Nov 2025, Zimmermann et al., 22 Dec 2025).

1. Statistical and Kernel Foundations

The classical Epps–Pulley (EP) statistic tests for normality via characteristic functions. Given a distribution PP with characteristic function ϕP(ω)\phi_P(\omega) and a Gaussian reference N(0,σ2)\mathcal{N}(0, \sigma^2), the univariate EP distance is

EP(P)=RϕP(ω)exp(12σ2ω2)2ρ(ω)dω,\mathrm{EP}(P) = \int_\mathbb{R} \left|\phi_P(\omega) - \exp\left(-\tfrac{1}{2}\sigma^2 \omega^2\right)\right|^2 \rho(\omega) \, d\omega,

where ρ(ω)\rho(\omega) is the Bochner spectral density of a Gaussian kernel. Proposition 3.1 in "KerJEPA: Kernel Discrepancies for Euclidean Self-Supervised Learning" (Zimmermann et al., 22 Dec 2025) establishes the equivalence between the EP statistic and the squared maximum mean discrepancy (MMD) with a Gaussian kernel k(x,y)=exp(γ(xy)2)k(x, y) = \exp(-\gamma(x - y)^2):

EP(P)=MMDk2(P,N(0,σ2)).\mathrm{EP}(P) = \mathrm{MMD}_k^2(P, \mathcal{N}(0, \sigma^2)).

For multidimensional embeddings, LeJEPA draws random directions, projects data to one dimension, and applies the EP statistic to each, averaging the results.

Definition for dd-dimensional embedding distribution QRd\mathcal{Q} \subset \mathbb{R}^d:

SIGReg(Q)=EθUnif(Sd1)[n1EP^n(θ#Q)],\mathrm{SIGReg}(\mathcal{Q}) = \mathbb{E}_{\theta \sim \operatorname{Unif}(\mathbb{S}^{d-1})} \left[ n^{-1} \widehat{\mathrm{EP}}_n(\theta_\# \mathcal{Q}) \right],

where EP^n\widehat{\mathrm{EP}}_n denotes the empirical EP value over projections θ\theta (Balestriero et al., 11 Nov 2025, Zimmermann et al., 22 Dec 2025).

2. Sliced MMD and Kummer Kernel

LeJEPA’s regularizer is a sliced MMD using the EP functional, computed via Monte Carlo over random directions. Theorem 4.1 in (Zimmermann et al., 22 Dec 2025) proves that this regularizer is equivalent to an MMD with the confluent-hypergeometric (Kummer) kernel:

kKummer(x,y)=1F1(12;d2;γxy22).k_\mathrm{Kummer}(x, y) = {}_1F_1\left(\tfrac{1}{2}; \tfrac{d}{2}; -\gamma\|x - y\|_2^2\right).

For large dd, the Kummer kernel asymptotically converges to the inverse-multiquadric (IMQ) kernel:

1F1(12;b;c)(1+4c2d3)1/2.{}_1F_1\left(\tfrac{1}{2}; b; -c \right) \approx \left(1 + \tfrac{4c}{2d-3}\right)^{-1/2}.

This reduces the regularizer in high dimensions to a heavy-tailed IMQ MMD. The analytic (infinite-slice) version is computationally more expensive (O(n2d)O(n^2 d)), but unbiased and robust to embedding dimension, while the finite-slice formulation is O(ndm)O(n d m), trading a small variance for scalability (Zimmermann et al., 22 Dec 2025).

3. Optimality of Isotropic Gaussian Embeddings

LeJEPA presents theoretical arguments (linear and nonlinear probe risk bounds) showing that, for a fixed total covariance, isotropic Gaussian embeddings uniquely minimize downstream prediction risk. For ridge regression, the variance term k1/λk\sum_k 1/\lambda_k (where λk\lambda_k are embedding covariance eigenvalues) is minimized if all λk\lambda_k are equal. For nonlinear tasks, the Fisher information functional J(p)J(p) in the integrated squared bias is uniquely minimized by the isotropic Gaussian (Balestriero et al., 11 Nov 2025). Thus, the SIGReg regularizer targets an optimal embedding law for both linear and nonlinear prediction.

4. Practical Algorithm and Complexity

The SIGReg implementation involves:

  1. Randomly sampling MM unit-norm directions {am}Sd1\{\mathbf{a}_m\} \subset \mathbb{S}^{d-1}.
  2. Projecting NN embeddings {zi}Rd\{z_i\} \subset \mathbb{R}^d onto each direction: ui,m=amziu_{i,m} = \mathbf{a}_m^\top z_i.
  3. Computing the empirical characteristic function and comparing to the target Gaussian characteristic function using quadrature on a grid of TT points.
  4. Averaging the EP statistics over all directions.

PyTorch-style code is provided for integration as a module (Balestriero et al., 11 Nov 2025, Zimmermann et al., 22 Dec 2025). The table below summarizes computational complexity for major variants:

Variant Complexity Slicing variance
Finite-slice (LeJEPA default) O(ndm)O(n d m) 1/m\sim 1/\sqrt{m}
Analytic (“unsliced”) O(n2d)O(n^2 d) None (unbiased)

SIGReg introduces a single trade-off hyperparameter λ\lambda and is robust to its value; default settings (λ=0.05\lambda=0.05, M=1024M=1024, T=17T=17) yield stable results across architectures and datasets. Ablations show stability of performance within broad ranges of MM, TT, and integration bounds (Balestriero et al., 11 Nov 2025).

5. Relation to Epps–Pulley Test Theory

The univariate Epps–Pulley statistic can be expressed via a spectral decomposition of a specific covariance kernel involving Hermite polynomials and a diagonalizing Fredholm determinant (Ebner et al., 2021). For LeJEPA, one may use a finite-rank Karhunen–Loève expansion

K(s,t)i=1mλiϕi(s)ϕi(t),K(s, t) \approx \sum_{i=1}^m \lambda_i \phi_i(s) \phi_i(t),

where the eigenvalues λi\lambda_i decay exponentially, so truncation after $10$–$20$ terms suffices for high-precision implementation. This exact framework is the mathematical foundation for the empirical, sketched version used in regularization. The practical approximations – slicing, quadrature – inherit the theoretical consistency established in the classical normality test literature (Ebner et al., 2021).

6. Stability, Generalization, and Empirical Benchmarks

LeJEPA with the Epps–Pulley regularizer exhibits:

  • Heuristics-free usage: no stop-gradient, teacher-student, or whitening layers.
  • Strong generalization across architectures (ResNet, ConvNeXt, ViT, Swin, MaxViT) and domains with no per-architecture hyperparameter tuning.
  • Empirical stability: consistent top-1 accuracy (±0.5%\pm0.5\% variation across integration/slicing parameters), stable large-scale training (billion-parameter models), and direct correlation (0.85\geq 0.85 Spearman) between training loss and downstream accuracy (Balestriero et al., 11 Nov 2025).
  • On ImageNet-1K, LeJEPA achieves 79.0% top-1 with ViT-H/14 under frozen linear probe, and scales effectively to larger models (Balestriero et al., 11 Nov 2025).
  • Sliced, analytic, and KSDReg/IMQ variants of the regularizer all achieve >91%>91\% top-1 on the ImageNette benchmark; IMQ-KSDReg achieves highest accuracy (91.9%), suggesting benefit in heavy-tailed discrepancies for further stability (Zimmermann et al., 22 Dec 2025).

7. Integration in Kernel Discrepancy and Self-Supervised Learning Paradigms

KerJEPA generalizes LeJEPA’s approach, pointing out that the Epps–Pulley regularizer is a special case of a sliced Gaussian-kernel MMD, with the closed-form “Kummer” kernel in high dimension. This situates SIGReg as a theoretically grounded, computationally efficient, and empirically robust instance in the kernel discrepancy regularization family. The analytic-slice (infinite direction) approach offers dimension-robust alternatives at higher computational cost, while the sketched SIGReg offers practical scalability for self-supervised learning. Adopting more flexible kernels (e.g., IMQ) can further improve stability and test accuracy, as demonstrated empirically (Zimmermann et al., 22 Dec 2025).

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to LeJEPA Epps-Pulley Regularizer.