Applying dependence-coefficient generalization methods to neural networks

Develop practical methodology for applying non-IID generalization frameworks that estimate dependence coefficients between random variables to deep neural networks, enabling these bounds to be effectively used in neural network settings.

Background

The literature on non-IID generalization includes approaches that estimate dependence coefficients or use stability to account for dependencies among samples. While such methods provide theoretical avenues for non-IID data, their practical application to neural networks remains unclear.

The authors instead pursue a martingale-based approach for token-level bounds, highlighting that bridging the gap for coefficient-based methods in neural networks is an unresolved challenge.

References

A related line of work has been to explicitly estimate coefficients which quantify the extent that random variables relate to each other \citep[e.g.,][]{mohri2007stability,kuznetsov2017generalization}. However, it is unclear how best to apply these methods to neural networks.

Unlocking Tokens as Data Points for Generalization Bounds on Larger Language Models  (2407.18158 - Lotfi et al., 2024) in Section 2, Related Work (Non-IID Generalization bounds)