Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
134 tokens/sec
GPT-4o
10 tokens/sec
Gemini 2.5 Pro Pro
47 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Generative random latent features models and statistics of natural images (2212.02987v2)

Published 6 Dec 2022 in cond-mat.dis-nn

Abstract: Complex, multivariable systems are often analyzed by grouping their constituent units into components, sometimes referred to as latent features, which afford physical or biological interpretation. However, a priori many different types of latent features and data decompositions can be defined, and one typically uses a trial and error approach to determine a decomposition that is natural to the system and its data. It is highly desirable to develop principled understanding of which decomposition is appropriate for given a data set. In this work, we take a step in this direction and argue that sample-sample correlations in the data carry important information to this effect. For this we construct a generative random latent feature matrix model of large data based on linear mixing of latent features. Key ingredient of our model is that we allow for statistical dependence between the mixing coefficients and argue that the model captures characteristic properties found in many types of natural data. Latent dimensionality and correlation patterns of the data are controlled by only two model parameters. The model's data patterns include (overlapping) clusters, sparse mixing, and constrained (non-negative) mixing. We describe the characteristic correlation and eigenvalue distributions of each pattern. Finally, we fit the model on correlation data from natural images and find a near perfect match with the sparse mixing regime of our model. This finding is in line with the well-known sparse coding structure in natural scene images and provides information about the appropriate data decomposition, namely a sparse coding scheme. We believe that our work will deliver similar insights for diverse data of biological systems.

Summary

We haven't generated a summary for this paper yet.