Identification of the structural functional ω(G) governing power-law scaling

Identify the functional ω(G) of the hidden bipartite factorization graph G (with input factors X_i, output factors Y_j, and parent sets I_j) that governs power-law scaling of the loss with respect to a resource variable ξ (such as compute, model capacity, or dataset size), i.e., determine ω(G) such that the loss satisfies L ∝ ξ^{−ω(G)}.

Background

The authors observe that their results suggest scaling behavior tied to structural complexity but not a clear power-law across the full experimental regime. Nevertheless, certain experiments hint at a power-law dependence of the loss on resources, parameterized by an exponent that depends on the hidden graphical factorization.

They posit the existence of a functional ω(G) capturing the simplicity of the underlying factorization and expect it to decrease with structural complexity. They explicitly leave the precise identification of ω(G) for future work, highlighting an unresolved question linking data structure to scaling-law exponents.

References

As a side note, a close look at some experiments (e.g., Figure~\ref{fig:filtration} or~\ref{fig:generalization}) suggests the existence of a power-law regime, which posits a functional \omega(G) such that

\mathcal{L} \propto \xi{-\omega(G)}, where \omega(G) captures how {\em simple} the underlying factorization is, and is thus expected to be a decreasing function of the complexity parameters of $G$. We leave the precise identification of such a functional for future work.

Learning with Hidden Factorial Structure (2411.01375 - Arnal et al., 2 Nov 2024) in Discussion