Entropy and Learning of Lipschitz Functions under Log-Concave Measures
(2509.10355v1)
Published 12 Sep 2025 in math.PR and math.FA
Abstract: We study regression of $1$-Lipschitz functions under a log-concave measure $\mu$ on $\mathbb{R}d$. We focus on the high-dimensional regime where the sample size $n$ is subexponential in $d$, in which distribution-free estimators are ineffective. We analyze two polynomial-based procedures: the projection estimator, which relies on knowledge of an orthogonal polynomial basis of $\mu$, and the least-squares estimator over low-degree polynomials, which requires no knowledge of $\mu$ whatsoever. Their risk is governed by the rate of polynomial approximation of Lipschitz functions in $L2(\mu)$. When this rate matches the Gaussian one, we show that both estimators achieve minimax bounds over a wide range of parameters. A key ingredient is sharp entropy estimates for the class of $1$-Lipschitz functions in $L2(\mu)$, which are new even in the Gaussian setting.
Summary
The paper presents two polynomial-based estimators—projection and least-squares—that leverage low-degree polynomial approximations to efficiently learn 1-Lipschitz functions.
It provides sharp upper and lower bounds on the L2 risk and metric entropy, showing that subexponential sample regimes suffice for high-dimensional regression.
The study demonstrates estimator robustness even under unknown log-concave measures, offering practical insights and computationally tractable methods for high-dimensional settings.
Entropy and Learning of Lipschitz Functions under Log-Concave Measures
Problem Setting and Motivation
This work addresses the regression problem for $1$-Lipschitz functions f:Rd→R under a log-concave measure μ in high dimensions, with a focus on the subexponential sample regime (n≪exp(d)). The central challenge is to construct estimators f^ for f from noisy samples (Xi,Yi), where Yi=f(Xi)+ξi and ξi are i.i.d. Gaussian noise, and to analyze the minimax risk in L2(μ). The paper is motivated by the inadequacy of distribution-free estimators in this regime and the need for procedures that exploit the structure of log-concave measures and the regularity of the function class.
Polynomial-Based Estimation Procedures
Two polynomial-based estimators are analyzed:
Projection Estimator: Assumes knowledge of an orthonormal polynomial basis for L2(μ). The estimator projects the empirical data onto the space of polynomials of degree at most m, with coefficients estimated from the data. For the Gaussian case, this corresponds to Hermite polynomials; for general log-concave μ, any orthonormal polynomial basis suffices.
Least-Squares Estimator: Does not require knowledge of μ. It selects the polynomial of degree at most m that minimizes the empirical squared error over the observed data.
Both estimators' performance is governed by the L2(μ)-approximation rate of $1$-Lipschitz functions by low-degree polynomials, denoted Ψμ(m). For the Gaussian measure, Ψγ(m)≲1/m; for general log-concave measures, the rate may be slower but is always subpolynomial.
Implementation Details
Projection Estimator: For known μ, compute empirical averages of the basis polynomials against the data, with a variance reduction step for the mean coefficient. The estimator is
f^=∣α∣≤m∑f^αPα,
where f^α are empirical averages as specified in the paper.
This is a standard quadratic program in the coefficients of the polynomial basis.
Choice of Degree m: The optimal m balances the approximation error Ψμ(m) and the estimation error, which scales with the dimension D=(md+m) of the polynomial space and the sample size n.
Computational Considerations: For moderate m and large d, D can be large, but for m=O(logn/logd), D remains subexponential in d for subexponential n.
Main Theoretical Results
Upper Bounds
Projection Estimator: For n in the range d5≤n≤edlogd, with m=⌊logn/logd⌋−4, the risk satisfies
E∥f−f^∥L2(μ)2≤Ψμ2(m)+O(1/d).
For larger n, the error term decays as O(1/m2).
Least-Squares Estimator: Achieves comparable risk bounds in a slightly smaller range of n, with an additional logarithmic factor in the estimation error due to the lack of knowledge of μ.
Gaussian Case: For μ=γd, both estimators achieve the minimax rate with Ψγ(m)≲1/m.
Lower Bounds and Metric Entropy
Metric Entropy of Lipschitz Functions: The paper establishes sharp lower bounds for the metric entropy HLμ(ε) of the class of $1$-Lipschitz functions in L2(μ), even in the Gaussian case. For isotropic product log-concave measures,
HLμ(ε)≳(⌊c/ε2⌋d)
for ε≫d−1/4.
Minimax Lower Bound: Using Fano's method and the entropy estimates, the minimax risk for learning $1$-Lipschitz functions is lower bounded by
Rn,d∗≳lognlogd
for n up to ecd2ηlogd (general log-concave) or ecdlogd (product case).
Matching Upper and Lower Bounds: For measures with Ψμ2(m)≲1/m (e.g., Gaussian, uniform on the hypercube), the projection and least-squares estimators achieve the minimax rate in the specified sample regimes.
Contrasts with Classical Nonparametric Regression
Curse of Dimensionality: Classical nonparametric estimators (e.g., nearest neighbors) require n≳exp(d) for nontrivial risk, whereas the polynomial-based estimators here achieve minimax rates for n subexponential in d.
Sample Complexity: To achieve L2 error ε, it suffices to take n≃dc/ε2 samples in the Gaussian case.
Technical Innovations
Sharp Entropy Bounds: The paper provides new lower bounds for the metric entropy of $1$-Lipschitz functions under isotropic log-concave measures, using random polynomial constructions and properties of the Langevin semigroup.
Polynomial Approximation Rates: The analysis leverages tensorization and concentration properties of log-concave measures to obtain dimension-dependent rates for polynomial approximation of Lipschitz functions.
Robustness to Unknown Measure: The least-squares estimator is shown to be nearly minimax optimal even without knowledge of μ, provided μ is log-concave and isotropic.
Implications and Future Directions
Practical Implications
High-Dimensional Regression: The results provide a principled approach for regression of Lipschitz functions in high dimensions under log-concave distributions, with provable guarantees in regimes where classical methods fail.
Algorithmic Simplicity: Both estimators are computationally tractable for moderate m and can be implemented using standard linear algebra routines.
Robustness: The least-squares estimator is applicable without knowledge of the underlying measure, making it suitable for practical scenarios with unknown or complex distributions.
Theoretical Implications
Entropy of Function Classes: The entropy bounds for Lipschitz functions under log-concave measures are new and may have further applications in empirical process theory and statistical learning.
Extension to Other Function Classes: The techniques may be adapted to other regularity classes (e.g., Sobolev, bounded variation) and to other structured measures.
Connections to Isoperimetry and Concentration: The results highlight the interplay between functional inequalities (Poincaré, log-Sobolev), polynomial approximation, and statistical estimation.
Future Developments
Sharper Approximation Rates: Further work may refine the polynomial approximation rates for specific log-concave measures, especially in non-product cases.
Beyond Lipschitz Functions: Extending the analysis to broader function classes or to settings with weaker regularity assumptions.
Adaptive Procedures: Developing estimators that adapt to unknown smoothness or intrinsic dimension.
Applications to Active Learning and Experimental Design: Leveraging the entropy and approximation results for optimal sampling strategies in high dimensions.
Conclusion
This paper rigorously characterizes the statistical and metric complexity of learning $1$-Lipschitz functions under log-concave measures in high dimensions. By analyzing polynomial-based estimators and establishing sharp entropy bounds, it demonstrates that minimax-optimal rates are achievable in the subexponential sample regime, in stark contrast to classical nonparametric methods. The results have significant implications for high-dimensional statistics, learning theory, and the analysis of function spaces under structured measures, and open several avenues for further research in both theory and practice.