GHPP: Group Hadamard Product Parametrization
- GHPP is a framework for overparameterizing structured sparsity problems using a groupwise Hadamard product map.
- It replaces non-smooth group penalties with smooth surrogate penalties, allowing fully differentiable optimization while preserving the original objective’s minimizers.
- Empirical results in sparse regression, deep network pruning, and structured filter sparsity demonstrate GHPP's efficacy in enhancing sparsity and predictive performance.
The Group Hadamard Product Parametrization (GHPP) is a framework for overparameterizing structured sparsity problems using a groupwise Hadamard product map. By replacing non-smooth group sparsity-inducing penalties such as the group-lasso ( norm) with smooth surrogate penalties in an expanded parameter space, GHPP enables fully differentiable and approximation-free optimization using standard gradient-based methods. This approach preserves both global and local minima of the original objective and generalizes to a spectrum of structured and unstructured regularization settings, including deep and non-convex variants (Kolb et al., 2023).
1. Mathematical Construction and Surrogate Penalty Structure
Given a parameter vector partitioned into disjoint groups , write with . GHPP introduces two sets of surrogate variables:
- (groupwise unconstrained vectors)
- (group scalars)
The Group Hadamard-product map is defined as
with each repeated within its group.
The original non-smooth regularized problem, as in group lasso, is
where is a smooth loss. GHPP transfers this to a smooth surrogate: For any fixed , the minimal penalty in subject to is , ensuring exact recovery of the original penalty:
2. Theoretical Guarantees: Equivalence and No Spurious Minima
Under assumptions of smooth surjective, block-separable and continuous minimizer structure, Kolb et al. (Thm 3.1) establish that the surrogate problem
is equivalent to the original problem
in the following precise sense:
- Infima are identical: .
- Every minimizer of corresponds to a minimizer of with and .
- Conversely, minimizers of push forward via to minimizers of .
The surrogate penalty majorizes the group term, attaining equality uniquely at the arithmetic-geometric mean (AM–GM) balance points. Local openness of at these points ensures that no new (“spurious”) local minima are introduced by the surrogate reformulation.
3. Algorithmic Implementation
GHPP leverages gradient descent or variants (e.g., Adam) in the overparameterized space . The scheme is as follows:
- Forward pass: Compute .
- Loss/penalty: Evaluate as above.
- Backpropagation (for group ):
- Update: Simultaneous steps for all .
Initialization may use the AM–GM balanced point or small random values. Final can optionally be thresholded post-optimization.
4. Empirical Performance and Practical Considerations
Extensive experiments demonstrate GHPP’s effectiveness in classical and deep learning settings:
- Sparse Linear Regression : With , , and nonzeros, GHPP (for ) outperformed SCAD, MCP, and Lasso in estimation error, test-RMSE, and support recovery. GHPP recovers group-lasso; introduces non-convex regularization, improving sparsity and predictive performance relative to convex methods.
- MLP Pruning (Fashion-MNIST): For LeNet-300-100 (k parameters), GHPP retained of parameters (baseline: ) at accuracy, with deeper factorizations () enhancing sparsity induction.
- Structured Filter Sparsity (VGG, MNIST): Partitioning convolution filters and applying GHPowP, of filters were pruned with accuracy loss—a baseline structured magnitude-prune failed past sparsity.
- Compute/Memory Overhead: Overparameterization increases resource requirements. For HPP with on MLP, per-sample compute time increases by ; for ResNet-20/CIFAR10, increases batch time by (batch-size 256) with modest extra GPU memory.
A plausible implication is that, while GHPP introduces overhead, the ratio remains manageable in modern hardware environments.
5. Connections to Existing Parametrizations
GHPP generalizes and unifies a range of overparameterization-based sparsity methods:
- It is a group-structured extension of the basic Hadamard Product Parametrization (HPP) used for penalties (Lemma 3.1), and relates to weight-decayed diagonal linear nets that induce group regularization.
- Deeper factorizations, both for HPP () and GHPP, correspond to non-convex or mixed regularizations, respectively, inducing stronger sparsity patterns.
- The GHPowP extension employs non-integer powers, enabling for any real , bypassing restrictions inherent to integer-product schemes.
- Parameter sharing (collapsing factors) reduces overhead with minimal effect on induced regularization (Lemma 4.7).
- The smooth variational-form (SVF) framework subsumes a wide variety of sparsity-inducing approaches known from deep learning and optimization literatures.
6. Broader Context, Extensions, and Unifying Perspective
Kolb et al.’s framework demonstrates that many classical and recent sparsity schemes—across statistics, optimization, and deep learning—are unified as variational forms in suitably overparameterized spaces (Kolb et al., 2023). GHPP, via its smooth surrogate, offers a generic and highly flexible foundation for structured sparsity, with tunable non-convexity and broad compatibility with differentiable programming. This suggests wide applicability to problems requiring structured parameter pruning, high-dimensional feature selection, and network compression.
Extensions such as deeper or more general factorizations (via Hadamard-powers or parameter-collapsing) further expand the method’s scope. The SVF perspective links GHPP to historical works (e.g., Micchelli 2013; Poon 2021), providing both theoretical and algorithmic connections throughout the sparse modeling landscape.