Complex distributions emerging in filtering and compression (1906.11266v3)

Published 26 Jun 2019 in cond-mat.dis-nn and physics.data-an

Abstract: In filtering, each output is produced by a certain number of different inputs. We explore the statistics of this degeneracy in an explicitly treatable filtering problem in which filtering performs the maximal compression of relevant information contained in inputs (arrays of zeroes and ones). This problem serves as a reference model for the statistics of filtering and related sampling problems. The filter patterns in this problem conveniently allow a microscopic, combinatorial consideration. This allows us to find the statistics of outputs, namely the exact distribution of output degeneracies, for arbitrary input sizes. We observe that the resulting degeneracy distribution of outputs decays as $e^{{-c\log^\alpha} !d}$ with degeneracy $d$, where $c$ is a constant and exponent $\alpha>1$, i.e. faster than a power law. Importantly, its form essentially depends on the size of the input data set, appearing to be closer to a power-law dependence for small data set sizes than for large ones. We demonstrate that for sufficiently small input data set sizes typical for empirical studies, this distribution could be easily perceived as a power law. We extend our results to filter patterns of various sizes and demonstrate that the shortest filter pattern provides the maximum informative representations of the inputs.

Collections

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

Generate Now

Complex distributions emerging in filtering and compression (1906.11266v3)

Collections

Summary

Follow-up Questions

Related Papers

Authors (4)