Papers
Topics
Authors
Recent
Search
2000 character limit reached

Latent Group Lasso

Updated 12 March 2026
  • Latent Group Lasso is a structured sparsity method that selects unions of potentially overlapping feature groups using latent variable decomposition.
  • It employs a convex penalty summing groupwise L2 norms, ensuring accurate model recovery and interpretability under high-dimensional constraints.
  • Efficient algorithms, such as covariate duplication and proximal methods, enable practical application in genomics, hierarchical modeling, and network analysis.

The Latent Group Lasso (LGL) is a structured sparsity-inducing technique that generalizes the classical group Lasso to settings where groups of features may overlap. By modeling latent variables supported on predefined groups and penalizing the sum of their groupwise norms, LGL enables selection of unions of potentially overlapping groups while maintaining convexity and meaningful support properties. This formalism supports a rich class of model-encoded dependencies, allowing for principled penalized regression under complex structured sparsity—an essential tool in genomics, hierarchical modeling, and applications requiring domain-driven constraints.

1. Mathematical Formulation and Norm Structure

Given observations yRny \in \mathbb{R}^n, predictor matrix XRn×pX \in \mathbb{R}^{n \times p}, and a collection of groups G={g1,,gI}G = \{g_1, \ldots, g_I\} where gi{1,,p}g_i \subset \{1, \ldots, p\} (permits overlaps), the LGL norm is defined via latent variables v(g)Rpv^{(g)} \in \mathbb{R}^p for each group gg:

  • supp(v(g))g\operatorname{supp}(v^{(g)}) \subset g
  • gGv(g)=w\sum_{g \in G} v^{(g)} = w (the regression coefficient vector)

The LGL penalty is: Ω(w)=min{v(g)}:gv(g)=wgGdgv(g)2\Omega(w) = \min_{\{v^{(g)}\}:\, \sum_g v^{(g)}=w} \sum_{g \in G} d_g\,\|v^{(g)}\|_2 where dg>0d_g > 0 are group weights.

The penalized objective takes the standard form: minwRpL(w)+λΩ(w)\min_{w \in \mathbb{R}^p} L(w) + \lambda\, \Omega(w) with L()L(\cdot) a convex loss (e.g., squared error or logistic).

The dual norm is

Ω(α)=maxgGdg1αg2\Omega^*(\alpha) = \max_{g \in G} d_g^{-1} \|\alpha_g\|_2

and the subdifferential at ww comprises

Ω(w)={α:Ω(α)1,αw=Ω(w)}\partial \Omega(w) = \{\alpha: \Omega^*(\alpha) \leq 1,\, \alpha^{\top} w = \Omega(w)\}

where, for each gg, if v(g)0v^{(g)} \neq 0 then αg=dgv(g)v(g)2\alpha_g = d_g \frac{v^{(g)}}{\|v^{(g)}\|_2}, and if v(g)=0v^{(g)} = 0, then αg2dg\|\alpha_g\|_2 \leq d_g (Obozinski et al., 2011).

2. Support Properties and Model Selection Consistency

The support induced by LGL is the union of those groups gg for which v(g)0v^{(g)} \neq 0 in the optimal latent decomposition—a "weak group-support." For the linear model y=Xw+εy = Xw^* + \varepsilon, LGL achieves model recovery under incoherence-type assumptions involving the Gram matrix Σ\Sigma on the true support JJ. Specifically, to ensure no spurious groups: gG1(w)Σg,J1ΣJ1,J11αJ1(w)2dg\forall\, g \notin \mathcal{G}_1(w^*)\qquad \|\Sigma_{g, J_1} \Sigma_{J_1, J_1}^{-1}\alpha_{J_1}(w^*)\|_2 \leq d_g and strict inequality for exact exclusion (Obozinski et al., 2011).

Under suitable decay of λn\lambda_n (λn0\lambda_n \to 0, but λnn\lambda_n \sqrt{n} \to \infty), LGL selects no false-positive groups with high probability as nn \to \infty. When the decomposition is essentially unique, the group-support is also exactly recovered.

3. Role and Selection of Group Weights

The weights {dg}\{d_g\} crucially determine the admissible supports and calibrate the penalty between groups of different sizes or nesting relationships:

  • To prevent redundancy, if ghg \subset h, require dg<dhd_g < d_h.
  • To avoid dominance, the weights must scale sufficiently steeply with group size; for instance, dk=k+ckd_k = \sqrt{k + c \sqrt{k}} is sufficient to control spurious group activation under pure noise.
  • Alternative weighting, such as dk=kγd_k = k^{\gamma} with γ(0,1/2)\gamma \in (0, 1/2), is viable for fine-tuning trade-offs in FDR/FNR, with γ1/4\gamma \approx 1/4 yielding balanced selection (Obozinski et al., 2011).

4. Algorithms and Computational Strategies

Efficient LGL algorithms exploit the special structure of latent decomposition:

  • Covariate duplication: Construct an expanded design matrix by duplicating each variable ii for every group gg containing ii. The optimization reduces to a standard disjoint group Lasso in this space, admitting block coordinate descent:
    1
    2
    3
    4
    5
    6
    
    Initialize v^(g)=0 for all g. Repeat until convergence:
      for each group g in G in cyclic order
        r  y  Σ_hg X_h v^(h)
        z  X_gᵀ r / n
        v^(g)  (1  λ d_g/z)_+  z
    w  Σ_g v^(g)
  • Proximal methods: Alternative approaches apply accelerated proximal-gradient descent on ww, computing proxλΩ()\operatorname{prox}_{\lambda \Omega}(\cdot) by projection onto the intersection of norm-constrained cylinders, possibly using dual projected-Newton for the Euclidean case (Villa et al., 2012).
  • Active-set strategies: Iteratively restrict attention to currently active groups where uGr2>τ\|u_{G_r}\|_2 > \tau. This leads to major computational speedups, especially as the solution sparsifies and the number of active groups stabilizes (Villa et al., 2012).
  • MKL interpretations: LGL can be seen as Multiple Kernel Learning with each group contributing a kernel Kg=XgXgK_g = X_g X_g^\top and optimizing a convex combination under constraints related to dgd_g (Obozinski et al., 2011).

5. Extensions: Hierarchical, Overlapping, and Rule-Based Grouping

LGL underpins a suite of structured penalties including:

  • Hierarchical models: When groups are derived from a DAG structure, LGL (more precisely, "Latent Overlapping Group Lasso" or LOG) regularizes via latent vectors on ancestor sets. This ensures that if a deep parameter is nonzero, all its ancestors are, enforcing atomic-level hierarchical zero patterns (Yan et al., 2015).
  • Complex selection rules: LGL readily encodes arbitrary combinatorial rules among predictors through appropriate group collection design. This enables regularization under domain-mandated constraints (e.g., strong heredity, force-in groups, logical interaction inclusion). The support of the estimator then exactly aligns with the allowed model dictionary as determined by these rules (Wang et al., 2022).
  • Network and latent structure: LGL has been generalized to latent group structures induced by networks, where no explicit group labels are required. Here, the penalty is defined by a heat-flow on a Laplacian-encoded graph, smoothly interpolating between standard Lasso and classical group Lasso by varying the diffusion time, and admitting efficient local Monte Carlo optimization schemes (Ghosh et al., 20 Jul 2025).
  • Time-varying and panel data contexts: Estimation of latent group structures in time-varying panel data leverages adaptive group fused-Lasso penalties, identifying group homogeneity in coefficient trajectories while maintaining oracle and clustering consistency under theoretical guarantees (Haimerl et al., 29 Mar 2025).

6. Empirical Performance and Applications

LGL exhibits marked empirical advantages over standard Lasso and disjoint group Lasso:

  • Simulation studies: In synthetic problems with overlapping groups, LGL achieves nearly perfect support recovery (e.g., ≈99% at n=100n = 100) compared to plain Lasso, which fails consistently. For chain graphs, larger block recovery is possible only for k2k \geq 2 using LGL, surpassing Lasso (Obozinski et al., 2011).
  • Biological applications: On breast cancer microarray data, employing groups from KEGG pathways, LGL delivers improvements in balanced accuracy (2–12%), reduces model complexity, and identifies more interpretable and reproducible biological signatures (Obozinski et al., 2011, Villa et al., 2012).
  • Structured network domains: Graph-Lasso with LGL selects much larger and more coherent subnetworks, aligning better with biological relevance compared to 1\ell_1 approaches (Obozinski et al., 2011).
  • Hierarchical modeling and covariance estimation: In time series and banded covariance estimation, LGL matches or betters group Lasso in estimation accuracy and support recovery while offering simpler and computationally efficient proximal operators—especially for path- and tree-structured groupings (Yan et al., 2015).
  • Prediction models under selection constraints: LGL encoding of clinical selection rules yields sparser models that respect all mandated dependencies and improves cross-validated risk compared to unconstrained (adaptive) Lasso (Wang et al., 2022).

7. Connections, Generalizations, and Complexity Considerations

  • LGL’s convexity and latent variable formulation allow seamless extension to overlapping, nested, or network-defined groupings while ensuring tractable optimization via modern first-order primal-dual or block coordinate strategies.
  • The computational overhead due to group overlap is mitigated by active set acceleration, projection-based prox-computation, and efficient handling of the latent variable decomposition, enabling practical deployment in high-dimensional regimes without need for dimensionality reduction pre-processing (Villa et al., 2012).
  • The formulation's flexibility accommodates classical 1\ell_1 and disjoint 2,1\ell_{2,1} group Lasso as special cases by, respectively, setting each group to singletons or using a disjoint partition.
  • Network-induced latent group penalties facilitate learning under latent grouping not specified a priori, with sample complexity scaling logarithmically in ambient dimension and no requirement for variable pre-clustering (Ghosh et al., 20 Jul 2025).

A plausible implication is that LGL and its modern extensions now serve as essential tools for interpretable and structure-aware feature selection in high-dimensional statistics, with diverse applications from biomedicine to econometrics and statistical network analysis.

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Latent Group Lasso (LGL).