Papers

Topics

Authors

Recent

View all

Assistant

AI Research Assistant

Well-researched responses based on relevant abstracts and paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses.

Gemini 2.5 Flash

Gemini 2.5 Flash 71 tok/s

Gemini 2.5 Pro 48 tok/s Pro

GPT-5 Medium 12 tok/s Pro

GPT-5 High 21 tok/s Pro

GPT-4o 81 tok/s Pro

Kimi K2 231 tok/s Pro

GPT OSS 120B 435 tok/s Pro

Claude Sonnet 4 33 tok/s Pro

2000 character limit reached

Entropy and Learning of Lipschitz Functions under Log-Concave Measures (2509.10355v1)

Published 12 Sep 2025 in math.PR and math.FA

Abstract: We study regression of $1$-Lipschitz functions under a log-concave measure $\mu$ on $\mathbb{R}^d$. We focus on the high-dimensional regime where the sample size $n$ is subexponential in $d$, in which distribution-free estimators are ineffective. We analyze two polynomial-based procedures: the projection estimator, which relies on knowledge of an orthogonal polynomial basis of $\mu$, and the least-squares estimator over low-degree polynomials, which requires no knowledge of $\mu$ whatsoever. Their risk is governed by the rate of polynomial approximation of Lipschitz functions in $L^2(\mu)$. When this rate matches the Gaussian one, we show that both estimators achieve minimax bounds over a wide range of parameters. A key ingredient is sharp entropy estimates for the class of $1$-Lipschitz functions in $L^2(\mu)$, which are new even in the Gaussian setting.

Summary

The paper presents two polynomial-based estimators—projection and least-squares—that leverage low-degree polynomial approximations to efficiently learn 1-Lipschitz functions.
It provides sharp upper and lower bounds on the L2 risk and metric entropy, showing that subexponential sample regimes suffice for high-dimensional regression.
The study demonstrates estimator robustness even under unknown log-concave measures, offering practical insights and computationally tractable methods for high-dimensional settings.

Entropy and Learning of Lipschitz Functions under Log-Concave Measures

Problem Setting and Motivation

This work addresses the regression problem for $1$-Lipschitz functions $f: \mathbb{R}^d \to \mathbb{R}$ under a log-concave measure $\mu$ in high dimensions, with a focus on the subexponential sample regime ( $n \ll \exp(d)$ ). The central challenge is to construct estimators $\hat{f}$ for $f$ from noisy samples $(X_i, Y_i)$ , where $Y_i = f(X_i) + \xi_i$ and $\xi_i$ are i.i.d. Gaussian noise, and to analyze the minimax risk in $L^2(\mu)$ . The paper is motivated by the inadequacy of distribution-free estimators in this regime and the need for procedures that exploit the structure of log-concave measures and the regularity of the function class.

Polynomial-Based Estimation Procedures

Two polynomial-based estimators are analyzed:

Projection Estimator: Assumes knowledge of an orthonormal polynomial basis for $L^2(\mu)$ . The estimator projects the empirical data onto the space of polynomials of degree at most $m$ , with coefficients estimated from the data. For the Gaussian case, this corresponds to Hermite polynomials; for general log-concave $\mu$ , any orthonormal polynomial basis suffices.
Least-Squares Estimator: Does not require knowledge of $\mu$ . It selects the polynomial of degree at most $m$ that minimizes the empirical squared error over the observed data.

Both estimators' performance is governed by the $L^2(\mu)$ -approximation rate of $1$-Lipschitz functions by low-degree polynomials, denoted $\Psi_\mu(m)$ . For the Gaussian measure, $\Psi_\gamma(m) \lesssim 1/m$ ; for general log-concave measures, the rate may be slower but is always subpolynomial.

Implementation Details

Projection Estimator: For known $\mu$ , compute empirical averages of the basis polynomials against the data, with a variance reduction step for the mean coefficient. The estimator is

$\hat{f} = \sum_{|\alpha| \leq m} \hat{f}_\alpha P_\alpha,$

where $\hat{f}_\alpha$ are empirical averages as specified in the paper.

Least-Squares Estimator: For unknown $\mu$ , solve the empirical risk minimization problem

$\hat{f}_{LS} = \arg\min_{\deg(P) \leq m} \sum_{i=1}^n (P(X_i) - Y_i)^2.$

This is a standard quadratic program in the coefficients of the polynomial basis.

Choice of Degree $m$ : The optimal $m$ balances the approximation error $\Psi_\mu(m)$ and the estimation error, which scales with the dimension $D = \binom{d+m}{m}$ of the polynomial space and the sample size $n$ .
Computational Considerations: For moderate $m$ and large $d$ , $D$ can be large, but for $m = O(\log n / \log d)$ , $D$ remains subexponential in $d$ for subexponential $n$ .

Main Theoretical Results

Upper Bounds

Projection Estimator: For $n$ in the range $d^5 \leq n \leq e^{\sqrt{d} \log d}$ , with $m = \lfloor \log n / \log d \rfloor - 4$ , the risk satisfies

$\mathbb{E} \|f - \hat{f}\|_{L^2(\mu)}^2 \leq \Psi_\mu^2(m) + O(1/d).$

For larger $n$ , the error term decays as $O(1/m^2)$ .

Least-Squares Estimator: Achieves comparable risk bounds in a slightly smaller range of $n$ , with an additional logarithmic factor in the estimation error due to the lack of knowledge of $\mu$ .
Gaussian Case: For $\mu = \gamma_d$ , both estimators achieve the minimax rate with $\Psi_\gamma(m) \lesssim 1/m$ .

Lower Bounds and Metric Entropy

Metric Entropy of Lipschitz Functions: The paper establishes sharp lower bounds for the metric entropy $H_L^\mu(\varepsilon)$ of the class of $1$-Lipschitz functions in $L^2(\mu)$ , even in the Gaussian case. For isotropic product log-concave measures,

$H_L^\mu(\varepsilon) \gtrsim \binom{d}{\lfloor c/\varepsilon^2 \rfloor}$

for $\varepsilon \gg d^{-1/4}$ .

Minimax Lower Bound: Using Fano's method and the entropy estimates, the minimax risk for learning $1$-Lipschitz functions is lower bounded by

$\mathcal{R}^*_{n,d} \gtrsim \frac{\log d}{\log n}$

for $n$ up to $e^{c d^{2\eta} \log d}$ (general log-concave) or $e^{c \sqrt{d} \log d}$ (product case).

Matching Upper and Lower Bounds: For measures with $\Psi_\mu^2(m) \lesssim 1/m$ (e.g., Gaussian, uniform on the hypercube), the projection and least-squares estimators achieve the minimax rate in the specified sample regimes.

Contrasts with Classical Nonparametric Regression

Curse of Dimensionality: Classical nonparametric estimators (e.g., nearest neighbors) require $n \gtrsim \exp(d)$ for nontrivial risk, whereas the polynomial-based estimators here achieve minimax rates for $n$ subexponential in $d$ .
Sample Complexity: To achieve $L^2$ error $\varepsilon$ , it suffices to take $n \simeq d^{c/\varepsilon^2}$ samples in the Gaussian case.

Technical Innovations

Sharp Entropy Bounds: The paper provides new lower bounds for the metric entropy of $1$-Lipschitz functions under isotropic log-concave measures, using random polynomial constructions and properties of the Langevin semigroup.
Polynomial Approximation Rates: The analysis leverages tensorization and concentration properties of log-concave measures to obtain dimension-dependent rates for polynomial approximation of Lipschitz functions.
Robustness to Unknown Measure: The least-squares estimator is shown to be nearly minimax optimal even without knowledge of $\mu$ , provided $\mu$ is log-concave and isotropic.

Implications and Future Directions

Practical Implications

High-Dimensional Regression: The results provide a principled approach for regression of Lipschitz functions in high dimensions under log-concave distributions, with provable guarantees in regimes where classical methods fail.
Algorithmic Simplicity: Both estimators are computationally tractable for moderate $m$ and can be implemented using standard linear algebra routines.
Robustness: The least-squares estimator is applicable without knowledge of the underlying measure, making it suitable for practical scenarios with unknown or complex distributions.

Theoretical Implications

Entropy of Function Classes: The entropy bounds for Lipschitz functions under log-concave measures are new and may have further applications in empirical process theory and statistical learning.
Extension to Other Function Classes: The techniques may be adapted to other regularity classes (e.g., Sobolev, bounded variation) and to other structured measures.
Connections to Isoperimetry and Concentration: The results highlight the interplay between functional inequalities (Poincaré, log-Sobolev), polynomial approximation, and statistical estimation.

Future Developments

Sharper Approximation Rates: Further work may refine the polynomial approximation rates for specific log-concave measures, especially in non-product cases.
Beyond Lipschitz Functions: Extending the analysis to broader function classes or to settings with weaker regularity assumptions.
Adaptive Procedures: Developing estimators that adapt to unknown smoothness or intrinsic dimension.
Applications to Active Learning and Experimental Design: Leveraging the entropy and approximation results for optimal sampling strategies in high dimensions.

Conclusion

This paper rigorously characterizes the statistical and metric complexity of learning $1$-Lipschitz functions under log-concave measures in high dimensions. By analyzing polynomial-based estimators and establishing sharp entropy bounds, it demonstrates that minimax-optimal rates are achievable in the subexponential sample regime, in stark contrast to classical nonparametric methods. The results have significant implications for high-dimensional statistics, learning theory, and the analysis of function spaces under structured measures, and open several avenues for further research in both theory and practice.