Why deep networks assign higher density to simple data

Determine the underlying mechanism that causes trained deep neural networks, across diverse architectures and training paradigms, to assign higher density to simpler samples rather than to more complex samples.

Background

The paper documents a robust empirical regularity across many model families (iGPT, PixelCNN++, Glow, score-based diffusion, DINOv2, I-JEPA): samples judged as visually simpler systematically receive higher estimated density, while more complex samples receive lower density.

This effect appears both within datasets (e.g., CIFAR-10) and across in-distribution/out-of-distribution comparisons (e.g., CIFAR-10 vs. SVHN), and is consistent across independently trained models and external complexity proxies (e.g., JPEG compressibility, gradient-based measures). These observations motivate a fundamental theoretical question about the cause of this pervasive simplicity preference.

References

But taken together, they still leave open a more basic question: why do deep networks, across architectures and training paradigms, keep assigning higher density to simpler samples in the first place?

Deep Networks Favor Simple Data  (2604.00394 - Lu et al., 1 Apr 2026) in Introduction