Papers
Topics
Authors
Recent
Search
2000 character limit reached

High-dimensional nonconvex lasso-type $M$-estimators

Published 12 Apr 2022 in math.ST | (2204.05792v2)

Abstract: This paper proposes a theory for $\ell_1$-norm penalized high-dimensional $M$-estimators, with nonconvex risk and unrestricted domain. Under high-level conditions, the estimators are shown to attain the rate of convergence $s_0\sqrt{\log(nd)/n}$, where $s_0$ is the number of nonzero coefficients of the parameter of interest. Sufficient conditions for our main assumptions are then developed and finally used in several examples including robust linear regression, binary classification and nonlinear least squares.

Summary

  • The paper establishes that ℓ1-penalized M-estimators achieve a fast non-asymptotic rate of s₀√(log(nd)/n) in high dimensions, even with nonconvex loss functions.
  • It demonstrates that local strong convexity and uniform convergence over data-dependent ℓ1-balls suffice, bypassing the need for global convexity or compact domain constraints.
  • Applications in robust regression, binary classification, and nonlinear least squares illustrate the method’s broad utility and potential for sparse recovery.

High-dimensional Nonconvex Lasso-type MM-estimators: Summary and Analysis

Problem Statement and Motivation

This paper provides a comprehensive theoretical analysis of high-dimensional MM-estimators with 1\ell_1 regularization, where the empirical risk function R^(θ)\widehat{R}(\theta) may be nonconvex and the parameter space Θ\Theta is unrestricted. The regime of interest is when the number of parameters dd grows with, and may substantially exceed, the sample size nn—the canonical high-dimensional statistics scenario. Classical lasso-type results assume convexity and often impose constraints on Θ\Theta (e.g., restricting to 1\ell_1 or 2\ell_2 balls), which may not hold in modern robust or nonlinear supervised learning settings. This work addresses the fundamental question of whether 1\ell_1-regularized estimators can still enjoy fast rates of convergence for general nonconvex MM-estimation problems without such convexity or domain restrictions.

Main Theoretical Contributions

The primary result establishes that penalized MM-estimators of the form

θ^arg minθΘ{R^(θ)+λnθ1}\widehat{\theta}\in\argmin_{\theta\in \Theta}\left\{\widehat{R}(\theta) + \lambda_n|\theta|_1\right\}

attain the non-asymptotic estimation rate s0log(nd)/ns_0\sqrt{\log(nd)/n} in 1\ell_1 norm under sparsity, where s0s_0 is the number of nonzero entries of the true parameter θ0\theta_0. This rate matches the minimax optimal rate for high-dimensional parameters under standard conditions in convex settings.

Critically, the analysis weakens the usual global convexity or restricted strong convexity requirements. Instead, local strong convexity of the population risk R()R(\cdot) in an 2\ell_2-ball around θ0\theta_0 suffices, and the empirical risk R^\widehat{R} only needs to converge uniformly over a growing 1\ell_1-ball. The penalty ensures that, with high probability, the estimator θ^\widehat{\theta} lies within a data-driven 1\ell_1-ball whose radius only grows slowly with nn and dd.

No explicit restrictions are needed on the domain Θ\Theta. This generality is significant since previous high-dimensional lasso literature, particularly for nonconvex empirical risk, typically requires optimization to be constrained to compact sets to guarantee uniform convergence and manage local minima.

High-Level Assumptions and Sufficient Conditions

The following high-level assumptions underpin the main results:

  • Identification: R(θ)R(\theta) achieves a unique minimum at θ0\theta_0 and possesses a form of uniform local identifiability in a neighborhood of θ0\theta_0.
  • Uniform Convergence: The empirical risk R^(θ)\widehat{R}(\theta) uniformly approximates R(θ)R(\theta) over the relevant region (the 1\ell_1-ball).
  • Local Strong Convexity: R()R(\cdot) is locally strongly convex near θ0\theta_0; no global convexity is required.
  • Deviation Control: The increments Δ^(θ)Δ^(θ0)\widehat{\Delta}(\theta) - \widehat{\Delta}(\theta_0) scale in 1\ell_1 norm as O(log(nd)/n)O\left(\sqrt{\log(nd)/n}\right) with high probability.

The authors systematically analyze when these high-level conditions are guaranteed in practice, providing explicit sufficient conditions on the risk and empirical process structure. Notably, the crucial deviation conditions are derived via contraction arguments and symmetrization, using empirical process theory adapted for dependent, high-dimensional parameter sets.

Applications

The findings are instantiated in several fundamental contexts:

  • Robust Regression: For robust estimators with bounded, differentiable loss (e.g., Tukey's bisquare), when the design XX is sub-Gaussian and bounded, the estimator achieves 1\ell_1 estimation error OP(s0log(nd)/n)O_P(s_0\sqrt{\log(nd)/n}) without restricting Θ\Theta. Compared to prior work, the absence of domain constraints is highlighted.
  • Binary Classification: For models with a general link function σ\sigma (e.g., logistic), as long as σ\sigma is strictly increasing and bounded, similar rates are obtained. The key requirement is a bounded 2\ell_2 norm of θ0\theta_0.
  • Nonlinear Least Squares: When the nonlinear function ff is differentiable, bounded, and strictly increasing, and the regression error is Gaussian, a matched rate is proven.

Each application section carefully verifies that all technical conditions are met in these cases and compares the assumptions to those in prior results, such as [loh2017statistical] and [mei2018landscape], demonstrating broader applicability regarding parameter domains at the cost of focusing on global minima only.

Implications and Future Directions

The results have strong practical and theoretical implications:

  • General Validity of Lasso in Nonconvex Settings: The work demonstrates that 1\ell_1-penalized MM-estimation in high dimensions retains its statistical optimality under minimal convexity conditions, greatly expanding the class of admissible loss functions and thus broadening applicability to robust and nonlinear problems.
  • No Need for Artificial Domain Constraints: One can analyze penalized estimators on the entire ambient space, opposed to previous analyses requiring constraint sets, provided the empirical process is well behaved in a data-dependent neighborhood.
  • Decoupling Optimization and Statistical Accuracy: While the estimator may not always be computable (due to possible nonconvex local minima), the statistical argument shows that if the global minimizer is obtained, optimal rates are achieved. Recent literature addresses optimization-statistics tradeoffs for local minima; this paper chooses to focus on global minima.
  • Pathway to Variable Selection: The study leaves open the variable selection problem (support recovery), which typically requires incoherence conditions for 1\ell_1 penalization. The analysis points to a possible benefit from nonconvex regularizers (such as SCAD or MCP), as suggested by [loh2017support].

Potential extensions include: investigating oracle inequalities for prediction error (not just estimation error), relaxing uniform identifiability assumptions, analyzing handling of semiparametric MM-estimators, and bridging to results for local minima in nonconvex landscapes.

Conclusion

This work rigorously establishes that 1\ell_1-regularized MM-estimators, computed as global minimizers of possibly nonconvex empirical losses, exhibit optimal 1\ell_1 error rates in high dimensions, independent of constraints on the parameter space and under only local curvature assumptions. The approach broadens theoretical guarantees for sparse recovery well beyond the convex setting, and the results inform the design and analysis of high-dimensional estimators in robust and nonlinear modeling scenarios. Further research on optimization/statistical tradeoffs and variable selection in this setting is a promising direction.


Reference:

"High-dimensional nonconvex lasso-type MM-estimators" (2204.05792)

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Authors (2)

Collections

Sign up for free to add this paper to one or more collections.