Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
139 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A Statistical Analysis for Supervised Deep Learning with Exponential Families for Intrinsically Low-dimensional Data (2412.09779v1)

Published 13 Dec 2024 in stat.ML, cs.LG, math.ST, and stat.TH

Abstract: Recent advances have revealed that the rate of convergence of the expected test error in deep supervised learning decays as a function of the intrinsic dimension and not the dimension $d$ of the input space. Existing literature defines this intrinsic dimension as the Minkowski dimension or the manifold dimension of the support of the underlying probability measures, which often results in sub-optimal rates and unrealistic assumptions. In this paper, we consider supervised deep learning when the response given the explanatory variable is distributed according to an exponential family with a $\beta$-H\"older smooth mean function. We consider an entropic notion of the intrinsic data-dimension and demonstrate that with $n$ independent and identically distributed samples, the test error scales as $\tilde{\mathcal{O}}\left(n{-\frac{2\beta}{2\beta + \bar{d}{2\beta}(\lambda)}}\right)$, where $\bar{d}{2\beta}(\lambda)$ is the $2\beta$-entropic dimension of $\lambda$, the distribution of the explanatory variables. This improves on the best-known rates. Furthermore, under the assumption of an upper-bounded density of the explanatory variables, we characterize the rate of convergence as $\tilde{\mathcal{O}}\left( d{\frac{2\lfloor\beta\rfloor(\beta + d)}{2\beta + d}}n{-\frac{2\beta}{2\beta + d}}\right)$, establishing that the dependence on $d$ is not exponential but at most polynomial. We also demonstrate that when the explanatory variable has a lower bounded density, this rate in terms of the number of data samples, is nearly optimal for learning the dependence structure for exponential families.

Summary

We haven't generated a summary for this paper yet.