GHPP: Group Hadamard Product Parametrization

Updated 23 November 2025

GHPP is a framework for overparameterizing structured sparsity problems using a groupwise Hadamard product map.
It replaces non-smooth group penalties with smooth surrogate penalties, allowing fully differentiable optimization while preserving the original objective’s minimizers.
Empirical results in sparse regression, deep network pruning, and structured filter sparsity demonstrate GHPP's efficacy in enhancing sparsity and predictive performance.

The Group Hadamard Product Parametrization (GHPP) is a framework for overparameterizing structured sparsity problems using a groupwise Hadamard product map. By replacing non-smooth group sparsity-inducing penalties such as the group-lasso ( $L_{2,1}$ norm) with smooth surrogate penalties in an expanded parameter space, GHPP enables fully differentiable and approximation-free optimization using standard gradient-based methods. This approach preserves both global and local minima of the original objective and generalizes to a spectrum of structured and unstructured regularization settings, including deep and non-convex variants (Kolb et al., 2023).

1. Mathematical Construction and Surrogate Penalty Structure

Given a parameter vector $\bm\beta\in\R^d$ partitioned into $L$ disjoint groups $\mathcal{G}_1,\dots,\mathcal{G}_L$ , write $\bm\beta = (\bm\beta_1,\dots,\bm\beta_L)$ with $\bm\beta_j\in\R^{|\mathcal{G}_j|}$ . GHPP introduces two sets of surrogate variables:

$\bm u = (\bm u_1,\dots,\bm u_L)\in\R^d$ (groupwise unconstrained vectors)
$\bm\nu = (\nu_1,\dots,\nu_L)\in\R^L$ (group scalars)

The Group Hadamard-product map is defined as

$K\colon \R^d \times \R^L \rightarrow \R^d, \quad (\bm u,\bm\nu)\mapsto \bm u \odot_{\mathcal{G}}\bm\nu = (\nu_j\bm u_j)_{j=1}^L = \bm\beta$

with each $\nu_j$ repeated within its group.

The original non-smooth regularized problem, as in group lasso, is

$P(\bm\beta)=\mathcal{L}(\bm\beta) + 2\lambda\sum_{j=1}^L \|\bm\beta_j\|_2,$

where $\mathcal{L}$ is a smooth loss. GHPP transfers this to a smooth surrogate: $Q(\bm u, \bm\nu) = \mathcal{L}(\bm u \odot_{\mathcal{G}} \bm\nu) + \lambda\sum_{j=1}^L \bigl( \|\bm u_j\|_2^2 + \nu_j^2 \bigr).$ For any fixed $\bm\beta$ , the minimal penalty in $(\bm u_j, \nu_j)$ subject to $\bm u_j\nu_j = \bm\beta_j$ is $2\|\bm\beta_j\|_2$ , ensuring exact recovery of the original penalty: $\min_{K(\bm u, \bm\nu) = \bm\beta} Q(\bm u, \bm\nu) = P(\bm\beta).$

2. Theoretical Guarantees: Equivalence and No Spurious Minima

Under assumptions of smooth surjective, block-separable $K$ and continuous minimizer structure, Kolb et al. (Thm 3.1) establish that the surrogate problem

$\min_{\bm u, \bm\nu} Q(\bm u, \bm\nu)$

is equivalent to the original problem

$\min_{\bm\beta} P(\bm\beta),$

in the following precise sense:

Infima are identical: $\inf P = \inf Q$ .
Every minimizer $\hat{\bm\beta}$ of $P$ corresponds to a minimizer $(\hat{\bm u},\hat{\bm\nu})$ of $Q$ with $\hat{\bm\beta} = K(\hat{\bm u},\hat{\bm\nu})$ and $Q(\hat{\bm u},\hat{\bm\nu})=P(\hat{\bm\beta})$ .
Conversely, minimizers of $Q$ push forward via $K$ to minimizers of $P$ .

The surrogate penalty majorizes the group $L_{2,1}$ term, attaining equality uniquely at the arithmetic-geometric mean (AM–GM) balance points. Local openness of $K$ at these points ensures that no new (“spurious”) local minima are introduced by the surrogate reformulation.

3. Algorithmic Implementation

GHPP leverages gradient descent or variants (e.g., Adam) in the overparameterized space $(\bm u,\bm\nu)$ . The scheme is as follows:

Forward pass: Compute $\bm\beta = K(\bm u, \bm\nu)$ .
Loss/penalty: Evaluate $Q$ as above.
Backpropagation (for group $j$ ):

$\nabla_{u_j} Q = \nu_j\nabla_{\beta_j} \mathcal{L}(\bm\beta) + 2\lambda \bm u_j$

$\partial Q/\partial \nu_j = \bm u_j^\top \nabla_{\beta_j} \mathcal{L}(\bm\beta) + 2\lambda \nu_j$

Update: Simultaneous steps for all $\bm{u}_j, \nu_j$ .

Initialization may use the AM–GM balanced point $(u_j^0 = \beta_j^0 / \sqrt{\|\beta_j^0\|_2}, \nu_j^0 = \sqrt{\|\beta_j^0\|_2})$ or small random values. Final $\bm\beta_j$ can optionally be thresholded post-optimization.

4. Empirical Performance and Practical Considerations

Extensive experiments demonstrate GHPP’s effectiveness in classical and deep learning settings:

Sparse Linear Regression $(d\gg n)$ : With $n=500$ , $d=1000$ , and $s=10$ nonzeros, GHPP $_k$ (for $k=2...6$ ) outperformed SCAD, MCP, and Lasso in estimation error, test-RMSE, and support recovery. GHPP $_2$ recovers group-lasso; $k>2$ introduces non-convex $\ell_{2,2/k}$ regularization, improving sparsity and predictive performance relative to convex methods.
MLP Pruning (Fashion-MNIST): For LeNet-300-100 ( $\approx270$ k parameters), GHPP $_4$ retained $\sim0.4\%$ of parameters (baseline: $\sim4\%$ ) at $75\%$ accuracy, with deeper factorizations ( $k\uparrow$ ) enhancing sparsity induction.
Structured Filter Sparsity (VGG, MNIST): Partitioning convolution filters and applying GHPowP $_k$ , $>90\%$ of filters were pruned with $<1\%$ accuracy loss—a baseline structured magnitude-prune failed past $50\%$ sparsity.
Compute/Memory Overhead: Overparameterization increases resource requirements. For HPP $_k$ with $k=8$ on MLP, per-sample compute time increases by $<3\times$ ; for ResNet-20/CIFAR10, $k=8$ increases batch time by $<5\%$ (batch-size 256) with modest extra GPU memory.

A plausible implication is that, while GHPP introduces overhead, the ratio remains manageable in modern hardware environments.

5. Connections to Existing Parametrizations

GHPP generalizes and unifies a range of overparameterization-based sparsity methods:

It is a group-structured extension of the basic Hadamard Product Parametrization (HPP) used for $\ell_1$ penalties (Lemma 3.1), and relates to weight-decayed diagonal linear nets that induce group $\ell_{2,1}$ regularization.
Deeper factorizations, both for HPP ( $k>2$ ) and GHPP, correspond to non-convex $\ell_{2/k}$ or mixed $\ell_{2,2/k}$ regularizations, respectively, inducing stronger sparsity patterns.
The GHPowP extension employs non-integer powers, enabling $\ell_{2,2/k}$ for any real $k>1$ , bypassing restrictions inherent to integer-product schemes.
Parameter sharing (collapsing $k-1$ factors) reduces overhead with minimal effect on induced regularization (Lemma 4.7).
The smooth variational-form (SVF) framework subsumes a wide variety of sparsity-inducing approaches known from deep learning and optimization literatures.

6. Broader Context, Extensions, and Unifying Perspective

Kolb et al.’s framework demonstrates that many classical and recent sparsity schemes—across statistics, optimization, and deep learning—are unified as variational forms in suitably overparameterized spaces (Kolb et al., 2023). GHPP, via its smooth surrogate, offers a generic and highly flexible foundation for structured sparsity, with tunable non-convexity and broad compatibility with differentiable programming. This suggests wide applicability to problems requiring structured parameter pruning, high-dimensional feature selection, and network compression.

Extensions such as deeper or more general factorizations (via Hadamard-powers or parameter-collapsing) further expand the method’s scope. The SVF perspective links GHPP to historical works (e.g., Micchelli 2013; Poon 2021), providing both theoretical and algorithmic connections throughout the sparse modeling landscape.

Markdown Report Issue Upgrade to Chat

References (1)

Smoothing the Edges: Smooth Optimization for Sparse Regularization using Hadamard Overparametrization (2023)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Group Hadamard Product Parametrization (GHPP).