Convex M-Estimation for Robust Inference

Updated 5 March 2026

Convex M-estimation is a framework that minimizes convex loss functions, often with a regularizer, to ensure global optimality and computational tractability.
It extends to geodesic settings for structured covariance estimation and robust high-dimensional modeling in various statistical applications.
The approach offers strong theoretical guarantees, including asymptotic normality and convergence rates, using efficient algorithms like proximal gradient and splitting methods.

Convex M-estimation is a central methodological framework in statistics and machine learning, underlying robust inference in high-dimensional linear and multivariate models, distributional settings, and structured optimization scenarios. The defining feature is the minimization of an empirical criterion consisting of a convex loss function (possibly non-smooth and/or regularized) with respect to potentially high-dimensional parameters. Convex M-estimators enjoy strong theoretical guarantees, including global optimality, transparent asymptotic behavior, and computational tractability through modern first-order algorithms or geometric solvers.

1. Problem Formulation and Representative Models

Convex M-estimators are specified for observed data $\{Z_i\}_{i=1}^n$ by solving

$\hat \theta_n \in \arg\min_{\theta\in\Theta} \frac{1}{n}\sum_{i=1}^n \ell(Z_i,\theta) + g(\theta),$

where $\ell(\cdot,\theta)$ is convex in $\theta$ for each $Z_i$ , $\Theta$ is a closed convex set (possibly infinite-dimensional), and $g$ is a convex penalty (regularizer). Classical examples include linear regression with robust loss (e.g., Huber loss), regularized covariance/precision estimation, and penalized inference problems.

For linear models $y= X\beta + \epsilon$ , a prominent form is

$\hat\beta = \arg\min_{b\in\mathbb{R}^p} \frac{1}{n}\sum_{i=1}^n \rho(y_i-x_i^\top b) + g(b),$

with $\rho$ convex, e.g., the Huber loss or smoothed variants, and $g$ a strongly convex or elastic-net penalty (Bellec et al., 2021). In covariance matrix estimation, convex M-estimation often leverages geodesic convexity on the Riemannian manifold of positive definite matrices, enabling robust and structured inference for high-dimensional covariance structures (Ollila et al., 2016, Duembgen et al., 2016).

2. Convexity Structures and Geometric Extensions

Modern treatments emphasize not only Euclidean convexity, but geodesic convexity—crucial for scatter/covariance problems. For positive definite matrices $\mathcal{S}_{++}(p)$ , the affine-invariant Riemannian metric and associated geodesics $\gamma(t)$ support convexity of loss and penalty functionals along matrix-valued curves: $\gamma(t) = \Sigma_0^{1/2}\left(\Sigma_0^{-1/2}\Sigma_1\Sigma_0^{-1/2}\right)^t\Sigma_0^{1/2},\quad t\in[0,1].$ A function is g-convex if $f(\gamma(t)) \leq (1-t)f(\Sigma_0) + t f(\Sigma_1)$ . For M-estimators of scatter, this property ensures global optima and, under coercivity, uniqueness (Duembgen et al., 2016). Regularized $M$ -estimators combine g-convex data-fit terms with g-convex penalties, e.g., $\pi(\Sigma) = \mathrm{tr}(\Sigma)+\mathrm{tr}(\Sigma^{-1})$ , enforcing shrinkage toward identity or multiples thereof (Duembgen et al., 2016, Ollila et al., 2016).

This geometric perspective generalizes to metric spaces $\mathcal{M}$ , enabling robust $M$ -estimation in nonlinear or manifold-valued settings under only geodesic convexity of the loss (Brunel, 2023).

3. Asymptotic Theory and Variance Estimation

Convex M-estimators admit comprehensive asymptotic characterizations, even when the loss is non-smooth and/or the parameter is high-dimensional. For unconstrained smooth problems, classical theory yields

$\sqrt{n}(\hat\theta_n-\theta^*) \xrightarrow{d} N(0, S^{-1}BS^{-1}),$

with $S=\nabla^2\Phi(\theta^*)$ , $B = \mathrm{Var}(g(Z))$ (Brunel, 6 Nov 2025). For constrained settings, the limit law incorporates the structure of the constraint: $\sqrt{n}(\hat\theta_n-\theta^*) \xrightarrow{d} D\pi_{\Theta-\theta^*}^S(-S^{-1}\nabla\Phi(\theta^*);Z),$ where $D\pi$ is the directional derivative of the $S$ -metric projection onto the support cone at $\theta^*$ (Brunel, 6 Nov 2025).

In high-dimensional regression, coordinate-wise asymptotic normality of robust convex M-estimators, including de-biasing corrections, can be established even as $p/n=O(1)$ . Key results include data-driven variance estimators,

$\hat V = \frac{\|\psi\|_2^2 / n}{(\mathrm{Tr}(\nabla_y \psi)/n)^2},$

where $\psi = \rho'(y - X\hat \beta)$ and $\nabla_y \psi$ is the empirical Jacobian (Bellec et al., 2021). Effective sample sizes and degrees-of-freedom decompose the variance contributions.

For covariance matrix M-estimation, asymptotic normality extends to geodesic settings, yielding analogous limiting distributions via tangent-space analysis (Brunel, 2023).

4. Penalization, Regularization, and Structured Convex M-Estimation

Convex M-estimation frameworks readily accommodate a variety of regularizers and structured constraints, both in vector and matrix inference. Typical penalties include:

Strong convexity-inducing ( $\ell_2^2$ , elastic-net) for parametric models (Bellec et al., 2021),
Grouped or structured sparsity for mixed-effects or heterogeneous data (Ollila et al., 2016),
Matrix penalties (e.g., Kullback-Leibler, Riemannian, ellipticity-based) for multi-group scatter estimation, yielding solutions corresponding to arithmetic, geometric, or harmonic means of matrices (Ollila et al., 2016).

Perspective M-estimation unifies these approaches: penalties, concomitant scale modeling, and composite structures are all incorporated through perspective functions, supporting modular optimization via proximal splitting (Combettes et al., 2018). The design of penalties is essential for ensuring well-posedness, uniqueness, and statistical efficiency, as exemplified in generalized Gaussian and robust graphical model estimation (Ouzir et al., 2023, Zhang et al., 2013).

5. Algorithmic Strategies and Computational Guarantees

Optimization of the convex M-estimator typically proceeds via first-order or splitting methods:

Proximal gradient and accelerated schemes for smooth (or composite) problems,
Geodesic partial-Newton or block-coordinate solvers for manifold-valued and matrix-variable cases (Duembgen et al., 2016, Ollila et al., 2016),
Primal-dual and Douglas–Rachford algorithms in settings with complex regularization or perspective structure (Combettes et al., 2018, Ouzir et al., 2023).

Normal equations and fixed-point updates are available for many robust losses, e.g., iteratively reweighted least squares or MM approaches for Generalized Gaussian models (Zhang et al., 2013). Geometric convexity ensures global convergence—local minima are global, and, under strict convexity or coercivity, uniqueness of the solution is typically guaranteed (Duembgen et al., 2016, Ollila et al., 2016, Ouzir et al., 2023).

6. Statistical Optimality, Robustness, and Distribution-Free Theory

Convex M-estimation is central to robust statistics. For linear regression, optimal convex losses (in the sense of minimizing asymptotic variance among all convex M-estimators) are characterized by decreasing scores that best approximate the noise score, and in non-log-concave scenarios, are necessarily non-maximum likelihood (e.g., Huber-type) (Feng et al., 2024). This yields estimators achieving high asymptotic efficiency even under contamination or heavy-tailed noise.

Distribution-free estimation is possible in convex M-estimation without strong assumptions: the key is the absence of subgradient blow-up in the loss function over the interior of $\Theta$ , and a vanishing “gap-to-infinity” for unbounded $\Theta$ . Classical learnability, VC dimension, or Rademacher complexity bounds are neither necessary nor sufficient: only first-order control (local Lipschitzness) of the loss suffices for consistent minimax estimation (Areces et al., 28 May 2025).

7. Convergence Rates and Empirical Process Theory

Refined deviation inequalities and ergodic properties hold for convex M-estimators, including supremal exponential and polynomial tail bounds for the deviations of the estimator process across sample sizes (Ferger, 2023). Under boundedness or moment conditions, one establishes $r$ -complete and $r$ -quick convergence, with precise exponential rates governed by directional derivatives of the loss criterion.

For U-statistic-based convex M-estimators (Oja medians, robust scatter), identical limit theorems and convergence results extend the classical theory to depth-based or pairwise robust functionals (Brunel, 6 Nov 2025).

Convex M-estimation thus represents a flexible, theoretically grounded, and computationally reliable paradigm for robust, high-dimensional inference, with broad applicability in robust regression, covariance/precision estimation, structured and manifold-valued models, and nonparametric or distribution-free statistics. The maturity of the theory now includes asymptotic normality, variance estimation, algorithmic stability, and comprehensive statistical guarantees under minimal assumptions (Bellec et al., 2021, Ollila et al., 2016, Brunel, 6 Nov 2025, Brunel, 2023, Feng et al., 2024, Zhang et al., 2013).