Papers
Topics
Authors
Recent
Search
2000 character limit reached

GHPP: Group Hadamard Product Parametrization

Updated 23 November 2025
  • GHPP is a framework for overparameterizing structured sparsity problems using a groupwise Hadamard product map.
  • It replaces non-smooth group penalties with smooth surrogate penalties, allowing fully differentiable optimization while preserving the original objective’s minimizers.
  • Empirical results in sparse regression, deep network pruning, and structured filter sparsity demonstrate GHPP's efficacy in enhancing sparsity and predictive performance.

The Group Hadamard Product Parametrization (GHPP) is a framework for overparameterizing structured sparsity problems using a groupwise Hadamard product map. By replacing non-smooth group sparsity-inducing penalties such as the group-lasso (L2,1L_{2,1} norm) with smooth surrogate penalties in an expanded parameter space, GHPP enables fully differentiable and approximation-free optimization using standard gradient-based methods. This approach preserves both global and local minima of the original objective and generalizes to a spectrum of structured and unstructured regularization settings, including deep and non-convex variants (Kolb et al., 2023).

1. Mathematical Construction and Surrogate Penalty Structure

Given a parameter vector βRd\bm\beta\in\R^d partitioned into LL disjoint groups G1,,GL\mathcal{G}_1,\dots,\mathcal{G}_L, write β=(β1,,βL)\bm\beta = (\bm\beta_1,\dots,\bm\beta_L) with βjRGj\bm\beta_j\in\R^{|\mathcal{G}_j|}. GHPP introduces two sets of surrogate variables:

  • u=(u1,,uL)Rd\bm u = (\bm u_1,\dots,\bm u_L)\in\R^d (groupwise unconstrained vectors)
  • ν=(ν1,,νL)RL\bm\nu = (\nu_1,\dots,\nu_L)\in\R^L (group scalars)

The Group Hadamard-product map is defined as

K ⁣:Rd×RLRd,(u,ν)uGν=(νjuj)j=1L=βK\colon \R^d \times \R^L \rightarrow \R^d, \quad (\bm u,\bm\nu)\mapsto \bm u \odot_{\mathcal{G}}\bm\nu = (\nu_j\bm u_j)_{j=1}^L = \bm\beta

with each νj\nu_j repeated within its group.

The original non-smooth regularized problem, as in group lasso, is

P(β)=L(β)+2λj=1Lβj2,P(\bm\beta)=\mathcal{L}(\bm\beta) + 2\lambda\sum_{j=1}^L \|\bm\beta_j\|_2,

where L\mathcal{L} is a smooth loss. GHPP transfers this to a smooth surrogate: Q(u,ν)=L(uGν)+λj=1L(uj22+νj2).Q(\bm u, \bm\nu) = \mathcal{L}(\bm u \odot_{\mathcal{G}} \bm\nu) + \lambda\sum_{j=1}^L \bigl( \|\bm u_j\|_2^2 + \nu_j^2 \bigr). For any fixed β\bm\beta, the minimal penalty in (uj,νj)(\bm u_j, \nu_j) subject to ujνj=βj\bm u_j\nu_j = \bm\beta_j is 2βj22\|\bm\beta_j\|_2, ensuring exact recovery of the original penalty: minK(u,ν)=βQ(u,ν)=P(β).\min_{K(\bm u, \bm\nu) = \bm\beta} Q(\bm u, \bm\nu) = P(\bm\beta).

2. Theoretical Guarantees: Equivalence and No Spurious Minima

Under assumptions of smooth surjective, block-separable KK and continuous minimizer structure, Kolb et al. (Thm 3.1) establish that the surrogate problem

minu,νQ(u,ν)\min_{\bm u, \bm\nu} Q(\bm u, \bm\nu)

is equivalent to the original problem

minβP(β),\min_{\bm\beta} P(\bm\beta),

in the following precise sense:

  • Infima are identical: infP=infQ\inf P = \inf Q.
  • Every minimizer β^\hat{\bm\beta} of PP corresponds to a minimizer (u^,ν^)(\hat{\bm u},\hat{\bm\nu}) of QQ with β^=K(u^,ν^)\hat{\bm\beta} = K(\hat{\bm u},\hat{\bm\nu}) and Q(u^,ν^)=P(β^)Q(\hat{\bm u},\hat{\bm\nu})=P(\hat{\bm\beta}).
  • Conversely, minimizers of QQ push forward via KK to minimizers of PP.

The surrogate penalty majorizes the group L2,1L_{2,1} term, attaining equality uniquely at the arithmetic-geometric mean (AM–GM) balance points. Local openness of KK at these points ensures that no new (“spurious”) local minima are introduced by the surrogate reformulation.

3. Algorithmic Implementation

GHPP leverages gradient descent or variants (e.g., Adam) in the overparameterized space (u,ν)(\bm u,\bm\nu). The scheme is as follows:

  • Forward pass: Compute β=K(u,ν)\bm\beta = K(\bm u, \bm\nu).
  • Loss/penalty: Evaluate QQ as above.
  • Backpropagation (for group jj):

ujQ=νjβjL(β)+2λuj\nabla_{u_j} Q = \nu_j\nabla_{\beta_j} \mathcal{L}(\bm\beta) + 2\lambda \bm u_j

Q/νj=ujβjL(β)+2λνj\partial Q/\partial \nu_j = \bm u_j^\top \nabla_{\beta_j} \mathcal{L}(\bm\beta) + 2\lambda \nu_j

  • Update: Simultaneous steps for all uj,νj\bm{u}_j, \nu_j.

Initialization may use the AM–GM balanced point (uj0=βj0/βj02,νj0=βj02)(u_j^0 = \beta_j^0 / \sqrt{\|\beta_j^0\|_2}, \nu_j^0 = \sqrt{\|\beta_j^0\|_2}) or small random values. Final βj\bm\beta_j can optionally be thresholded post-optimization.

4. Empirical Performance and Practical Considerations

Extensive experiments demonstrate GHPP’s effectiveness in classical and deep learning settings:

  • Sparse Linear Regression (dn)(d\gg n): With n=500n=500, d=1000d=1000, and s=10s=10 nonzeros, GHPPk_k (for k=2...6k=2...6) outperformed SCAD, MCP, and Lasso in estimation error, test-RMSE, and support recovery. GHPP2_2 recovers group-lasso; k>2k>2 introduces non-convex 2,2/k\ell_{2,2/k} regularization, improving sparsity and predictive performance relative to convex methods.
  • MLP Pruning (Fashion-MNIST): For LeNet-300-100 (270\approx270k parameters), GHPP4_4 retained 0.4%\sim0.4\% of parameters (baseline: 4%\sim4\%) at 75%75\% accuracy, with deeper factorizations (kk\uparrow) enhancing sparsity induction.
  • Structured Filter Sparsity (VGG, MNIST): Partitioning convolution filters and applying GHPowPk_k, >90%>90\% of filters were pruned with <1%<1\% accuracy loss—a baseline structured magnitude-prune failed past 50%50\% sparsity.
  • Compute/Memory Overhead: Overparameterization increases resource requirements. For HPPk_k with k=8k=8 on MLP, per-sample compute time increases by <3×<3\times; for ResNet-20/CIFAR10, k=8k=8 increases batch time by <5%<5\% (batch-size 256) with modest extra GPU memory.

A plausible implication is that, while GHPP introduces overhead, the ratio remains manageable in modern hardware environments.

5. Connections to Existing Parametrizations

GHPP generalizes and unifies a range of overparameterization-based sparsity methods:

  • It is a group-structured extension of the basic Hadamard Product Parametrization (HPP) used for 1\ell_1 penalties (Lemma 3.1), and relates to weight-decayed diagonal linear nets that induce group 2,1\ell_{2,1} regularization.
  • Deeper factorizations, both for HPP (k>2k>2) and GHPP, correspond to non-convex 2/k\ell_{2/k} or mixed 2,2/k\ell_{2,2/k} regularizations, respectively, inducing stronger sparsity patterns.
  • The GHPowP extension employs non-integer powers, enabling 2,2/k\ell_{2,2/k} for any real k>1k>1, bypassing restrictions inherent to integer-product schemes.
  • Parameter sharing (collapsing k1k-1 factors) reduces overhead with minimal effect on induced regularization (Lemma 4.7).
  • The smooth variational-form (SVF) framework subsumes a wide variety of sparsity-inducing approaches known from deep learning and optimization literatures.

6. Broader Context, Extensions, and Unifying Perspective

Kolb et al.’s framework demonstrates that many classical and recent sparsity schemes—across statistics, optimization, and deep learning—are unified as variational forms in suitably overparameterized spaces (Kolb et al., 2023). GHPP, via its smooth surrogate, offers a generic and highly flexible foundation for structured sparsity, with tunable non-convexity and broad compatibility with differentiable programming. This suggests wide applicability to problems requiring structured parameter pruning, high-dimensional feature selection, and network compression.

Extensions such as deeper or more general factorizations (via Hadamard-powers or parameter-collapsing) further expand the method’s scope. The SVF perspective links GHPP to historical works (e.g., Micchelli 2013; Poon 2021), providing both theoretical and algorithmic connections throughout the sparse modeling landscape.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Group Hadamard Product Parametrization (GHPP).