Conjectured χ/ε sample complexity under the Factorization Hypothesis

Establish whether the sample complexity for learning the discrete conditional distribution p(y|x) with Kullback–Leibler (excess cross-entropy) loss at accuracy ε under the Factorization Hypothesis (FH: p(y|x) = ∏_{j=1}^{ℓ} p(y_j | pa_j) with input factorization X ≅ ∏_{i=1}^{k} X_i and output factorization Y ≅ ∏_{j=1}^{ℓ} Y_j, and parent sets I_j ⊆ [k]) is bounded by χ/ε, where χ = ∑_{j=1}^{ℓ} q_j × |pa_j|, q_j = |Y_j|, and |pa_j| = ∏_{i∈I_j} |X_i|.

Background

Without structural assumptions, the sample complexity for learning a discrete conditional distribution p(y|x) with KL/excess cross-entropy loss to accuracy ε scales as (1/ε)·|X|·|Y|. The paper introduces a hidden factorization of X and Y with output factors y_j depending only on parent subsets pa_j of the input factors, which decomposes the learning task into ℓ subtasks.

Motivated by this decomposition, the authors conjecture an optimistic sample complexity bound that depends on the factorization parameters, χ = ∑_{j=1}{ℓ} q_j × |pa_j|, which can be exponentially smaller than |X|·|Y|. They note the bound is computed as if the factorization were known, echoing rates in classical nonparametric settings where structural assumptions influence estimation rates even when specific structures are unknown.

References

Because the (hidden) factorization assumption transforms the learning task into \ell independent (but unknown) learning tasks, we conjecture the optimistic bound \chi / \ve on the sample complexity, where \begin{equation} \label{eq:SC} \tag{SC} \chi = \sum_{j=1}\ell q_j \times |\pa_j|, \end{equation} which is always smaller than~eq:sc (and often {\em exponentially smaller}). The bound is optimistic because it is computed as if the factorization was known in advance; this can be heuristically justified by the fact that for most classical tasks in nonparametric statistics, the rate of estimation depends directly on structural assumptions (i.e. multi-index models, manifold hypothesis, etc) even when one does not know the precise instance of these assumptions (i.e. indices in the multi-index models, location of the manifold, etc).

Learning with Hidden Factorial Structure (2411.01375 - Arnal et al., 2 Nov 2024) in Section 3 (Theoretical Analysis), Subsection "Statistical Complexity"