Group Lasso Rank Regression

Updated 15 October 2025

The paper presents a robust estimator that fuses Wilcoxon rank loss with group lasso to handle heavy-tailed noise and natural predictor groupings.
It details a simulation-based tuning rule and a scalable PALM+SSN algorithm that efficiently manages high-dimensional regression tasks.
Empirical results show improved variable selection and resilience to non-Gaussian errors compared to traditional least squares and l1-based methods.

Group lasso regularized rank regression refers to a class of methods that combine rank-based loss functions—offering robustness to heavy-tailed noise and outliers—with structured group-wise sparsity penalties (group lasso), enabling variable selection at the level of pre-specified groups of predictors. This formulation is particularly suitable for high-dimensional regression problems where predictors exhibit natural groupings and the underlying error distribution may deviate substantially from normality (Lin et al., 13 Oct 2025).

1. Mathematical Formulation and Statistical Properties

The fundamental estimator seeks the minimizer

$F(\beta) = L(\beta) + \lambda \Psi(\beta),$

where the loss function $L(\beta)$ is a Wilcoxon score-based rank-type objective,

$L(\beta) = \frac{1}{n(n-1)} \sum_{i \neq j} |(X\beta - y)_i - (X\beta - y)_j|,$

and the penalty is a group lasso,

$\Psi(\beta) = \sum_{\ell=1}^g w_\ell \|\beta_{\mathcal{G}_\ell}\|_2,$

with groupings $\mathcal{G}_1, \ldots, \mathcal{G}_g$ and nonnegative group weights $w_\ell$ (Lin et al., 13 Oct 2025).

Key properties of the model:

The rank loss is location-invariant and robust to outliers, addressing regression under heavy-tailed noise.
The group lasso penalty induces sparsity at the group level, allowing selection or exclusion of entire groups.
The use of positive weights enables balancing group penalties for disparate group sizes or prior importances.

Theoretical analysis demonstrates that, under mild conditions and with appropriately chosen $\lambda$ , the estimator admits finite-sample error bounds even under non-Gaussian errors. The pivotal property of the rank loss subgradient at $\beta = 0$ enables parameter selection without dependence on unknown error distributions (Lin et al., 13 Oct 2025).

2. Data-Driven and Simulation-Based Tuning Parameter Selection

A major challenge in regularized regression is tuning parameter selection. For rank-based group lasso, the dual-norm pivotality can be exploited to devise a distribution-free, simulation-based selection rule. Specifically, let $S_n$ denote the subgradient of the rank loss at $\beta = 0$ : $S_n = -\frac{2}{n(n-1)} X^\top \xi,$ where $\xi$ encodes a function of a random permutation of indices.

The dual norm of the group lasso penalty,

$\Psi^d(y) = \max_\ell \frac{\|y_{\mathcal{G}_\ell}\|_2}{w_\ell},$

is evaluated over Monte Carlo sampled $S_n$ replicates to estimate a $1-\alpha_0$ quantile $Q_{1-\alpha_0}$ , yielding the selection

$\lambda^* = c_0 \cdot Q_{1-\alpha_0}, \quad c_0 > 1.$

This rule provides an automatable, cross-validation-free choice of $\lambda$ that remains valid regardless of the error distribution (Lin et al., 13 Oct 2025).

3. Optimization Algorithms and Computational Scalability

The non-smoothness of both the rank objective and the penalty presents algorithmic challenges. The proposed solution is a proximal augmented Lagrangian method (PALM) applied to the dual reformulation:

Dualization introduces an auxiliary variable $s$ such that $X\beta - y = s$ , splitting the loss and penalty components.
Proximal terms (e.g., $(\tau/(2\sigma))\|w-w^k\|^2$ ) regularize the subproblems, addressing singularity and ill-conditioning.
Subproblems involving the conjugate functions $L^*$ $L^{*}$ and $\Psi^*$ $Ψ^{*}$ —the latter being the dual of the group penalty—are solved via efficient semismooth Newton (SSN) schemes.
- The rank loss structure admits ordered lasso-like representations, enabling per-iteration cost reductions to $O(n \log n)$ .
- The block structure of the group lasso allows blockwise handling and reuse of subdifferential computations (Lin et al., 13 Oct 2025).

Extensive numerical benchmarks confirm that the PALM+SSN approach scales nearly linearly with predictor dimension and, when compared to state-of-the-art triple-loop methods (e.g., proximal–proximal majorization–minimization), offers substantial speedup for large-scale high-dimensional regression.

4. Robustness and Estimation Accuracy in Statistical Applications

Empirical evaluation demonstrates consistently competitive (often superior) estimation accuracy and support recovery—especially in the presence of heavy-tailed or Cauchy noise—compared to group lasso-regularized least squares and $\ell_1$ -based rank regression alternatives. SOS experiments span varying covariance structures, group signal patterns (uniform, decaying), and both synthetic and real-world datasets from diverse domains (e.g., genomics, neuroimaging).

Highlights include:

Effective and interpretable variable selection at the group level, even when signal patterns differ substantially between groups.
Marked resilience to outlier contamination and non-Gaussian errors without model-specific tuning.
Applicability to ultra-high dimensional settings (hundreds of thousands of features), with runtimes and memory usage controlled to practical levels (Lin et al., 13 Oct 2025).

Variations and extensions of group lasso regularized rank regression encompass:

Use of adaptive weights (as in adaptive group lasso for sparse reduced rank regression) to achieve minimax-optimal rates and variable selection consistency under rank constraints (He et al., 2016).
Bayesian analogues employing hierarchical shrinkage priors and post-processing with group lasso for simultaneous low-rank and group-sparse estimation (Chakraborty et al., 2016).
Square-root (scale-free) group penalties, which provide robust parameter-tuning regimes independent of noise scale, a property shown to be interpretable as distributionally robust optimization (DRO) (Chu et al., 2021, Bunea et al., 2013).
Flexible groupings and algorithmic advances (e.g., restart or extrapolation schemes, as in low-rank matrix recovery via FLGSR (Yu et al., 18 Jan 2024)) that breathe computational efficiency and flexibility into the framework.
Extensions to multi-response, multi-group, or matrix-valued coefficient settings, where nuclear norm or blockwise penalties are natural generalizations (Zhou et al., 2012, Li et al., 2018, Hultman et al., 18 Mar 2025).

6. Practical Considerations and Implementation

Implementation in high-dimensional settings requires the following considerations:

Choice and construction of predictor groups $\{\mathcal{G}_\ell\}$ should reflect domain knowledge—e.g., genomic pathways, brain regions, or variable type.
Weights $w_\ell$ should be scaled appropriately to account for group sizes and variance, as unsupervised penalties may bias selection toward or against large groups.
Efficient solver access (e.g., C/Fortran–backed coordinate or blockwise descent methods, as in {sparsegl} (Liang et al., 2022)) is essential for real data problems, especially when the design matrix is large and/or sparse.
For applications such as GWAS or distributed biobank studies, distributed optimization with group-wise screening and privacy preservation can be incorporated (Li et al., 2017).
Simultaneous use of cross-validation, information criteria, or the pivotal simulation-based approach for $\lambda$ tuning may be advisable in exploratory or variable noise settings.

Group lasso regularized rank regression thus constitutes an advanced, robust, and scalable approach to high-dimensional regression under grouped sparsity and non-Gaussian errors, supported by rigorous theory, efficient algorithms, and empirical validation across scientific domains (Lin et al., 13 Oct 2025).