CP Rank Selection via Sparsity-Inducing Priors
- The paper presents a sparsity-inducing framework using submodular functions and convex surrogates to accurately recover the CP tensor rank.
- It details hierarchical Bayesian techniques and ARD priors that enable group-level shrinkage and automatic relevance determination in multiway data.
- The approach balances sparse model order selection with robust estimation through cross-validation, nonconvex regularizers, and scalable variational inference.
A sparsity-inducing prior for CP (CANDECOMP/PARAFAC) rank selection is a probabilistic or regularization framework designed to identify a minimal number of nonzero CP tensor components, thus effectively determining the tensor's rank by inducing sparsity in the set of candidate factors. The construction and implementation of such priors connect convex and nonconvex optimization, hierarchical Bayes, submodular analysis, and modern information criteria, and enable automatic or adaptive model order determination in high-dimensional multiway data.
1. Submodular Functions, Convex Envelopes, and Structured Norms
The selection of a small number of active CP tensor factors—a proxy for CP rank—can be approached as a combinatorial selection problem. Instead of directly minimizing the number of nonzero components, structured sparsity-inducing penalties use a set function , where indexes candidate factors, to encode both sparsity and prior structural constraints. is chosen to be nondecreasing and submodular (i.e., for all ), which generalizes the cardinality function.
The relaxation of the combinatorial penalty is achieved through its convex envelope on the ball, constructed as the Lovász extension :
for ordered as (1008.4220). The resulting polyhedral norm provides a convex surrogate tailored to the desired interaction between sparsity and structure.
In the context of CP rank selection, can specifically penalize the inclusion of additional rank-one components (e.g., yields the L1 norm), or encode block/groups, hierarchies, or couplings among factors to favor more structured low-rank solutions (1008.4220). The theoretical advantage is that the support of the minimizer under such penalties is a stable set for , with support recovery consistency under general conditions.
2. Hierarchical Bayesian and Group Priors
Hierarchical Bayesian frameworks provide sparse priors through Gaussian scale mixtures, where each parameter (e.g., a vector of CP factor weights or loadings) receives a prior:
- ,
- (Exponential or Inverse-Gamma) (1009.1914).
Marginalizing over the scale parameter yields Laplace or generalized t-distributions. Placing an additional hierarchy—such as an inverse-gamma prior at the next layer—produces heavy-tailed, nonconvex sparsity-inducing priors that can adapt to both large signals and noise, introducing less bias for significant CP components (1009.1914). For grouped variables (e.g., all weights for a given CP component), group-level scales allow entire components to be “turned on/off,” inducing block sparsity and facilitating group-level CP rank selection.
The maximum a posteriori (MAP) estimate under such priors is computed by iterative reweighting or expectation-maximization (EM), with adaptive penalties calibrated by current parameter estimates. This approach is computationally efficient and naturally extends to the group sparse CP decomposition setting (1009.1914).
3. Automatic Relevance Determination and Fully Bayesian CP Rank Estimation
Automatic relevance determination (ARD) priors penalize columns of factor matrices across all CP modes by assigning each component a common latent precision , with hierarchical Gamma hyperpriors:
- For each mode ,
As , the th component is shrunk to zero jointly across all modes. This yields a fully Bayesian CP decomposition with automatic rank selection: initialize with large, and the posterior inference prunes superfluous columns. Deterministic variational Bayesian (VB) algorithms provide closed-form parameter updates for all posteriors, scaling linearly with the number of observed entries. Predictive Student-t posteriors are naturally produced for imputation and uncertainty quantification in missing data scenarios (Zhao et al., 2014).
4. Flexible Sparsity Priors: Generalized Hyperbolic and Related Advances
Rigid choices of sparsity priors (e.g., Gaussian-gamma or Laplace) may fail for high-rank or low-SNR tensors. The generalized hyperbolic (GH) prior introduces additional flexibility, including parameters that tune both the sharpness at the origin and tail behavior. The prior for each group (column across modes for a CP component) is written as
with the Gaussian scale mixture representation:
This allows consistent and robust automatic pruning of components even in highly challenging settings, outperforming Gaussian-gamma priors in recovering both low and high tensor ranks at varying SNRs (Cheng et al., 2020). Variational Bayes inference yields closed-form updates for all parameters and latent variables, ensuring scalability.
5. Nonconvex, Polyhedral, and Adaptive Regularizers
Beyond convex relaxations, nonconvex sparsity-inducing regularizers such as group penalties with choices like the Geman function further sharpen support and remove estimation bias for large coefficients (Zhao et al., 2018). These penalties more decisively eliminate unnecessary CP components when an over-complete parameterization is provided, and can be solved efficiently with alternating minimization and majorization–minimization algorithms.
Recent work also develops frameworks to systematically generate sparsity-inducing regularizers with closed-form proximity or thresholding operators, enabling scalable optimization in both matrix and tensor (low-tubal-rank) completion problems. When applied to the singular values (or tubes), such regularizers act as nonconvex but computationally efficient rank surrogates, outperforming convex nuclear norm-based surrogates in many settings (Wang et al., 2023, Wang et al., 2023).
6. Calibration, Cross-Validation, and Information-Theoretic Model Selection
Model selection criteria addressing both sparsity and rank minimization must correctly adjust for the data-driven selection effect. Fixing regularization parameters (e.g., in Lasso-type penalties) across CV folds can yield inconsistent selection of sparsity patterns and ranks (She et al., 2018). This motivates cross-validation on the structural selection-projection pattern (e.g., active and projected CP factors) rather than on penalty magnitude, with minimax-optimal and scale-free information criteria calibrated to match the theoretical error bound:
where is active support size and is the (CP) rank. This framework ensures principled and reproducible rank and sparsity selection, bypassing the need for separate noise estimation (She et al., 2018).
7. Trade-offs, Limitations, and Practical Recommendations
Practical deployment of sparsity-inducing priors for CP rank selection must address the trade-off between sparsity (support size, parsimony) and the risk of discarding significant components. Iterative or cutting-plane strategies that incrementally enforce rank constraints or refine penalties enable exploration of the Pareto front between sparsity and model complexity (Fampa et al., 2020). While convex (polyhedral) penalties afford strong theoretical guarantees and scalable algorithms, nonconvex approaches can further enhance estimation accuracy but pose challenges with local minima and initialization sensitivity.
The choice of prior or penalty should be informed by empirical testing: moment-based or kurtosis-based tests can diagnose deviation from Laplace (L1) assumptions and prompt adaptive switching to or other generalized power priors (Griffin et al., 2017).
Summary Table: Methodological Approaches
Technique | Principal Feature | CP Rank Selection Mechanism |
---|---|---|
Submodular/Lovász Extensions | Structured convex surrogate via set function F | Polyhedral norm sparsity, support recovery |
Hierarchical Bayesian (HAL, ARD) | Gaussian scale mixtures; group & adaptive penalties | Group-wise shrinkage, MAP, Bayesian pruning |
Generalized Hyperbolic (GH) | Flexible, heavy-tailed prior via Gaussian mixtures | Robust ARD; improved high-rank/low-SNR rec. |
Nonconvex Regularizers | Bias-reduced, sharp thresholding (e.g., Geman, closed-form prox) | Aggressive component elimination, efficiency |
Cross-validation/Information | Calibrated, scale-free, selection-pattern-based CV | Ranking by structural error, minimax optimal |
The construction and calibration of sparsity-inducing priors for CP rank selection synthesizes convex geometry, submodular analysis, Bayesian inference, and algorithmic optimization. These techniques collectively enable both accurate rank estimation and robust, interpretable CP decompositions in practical multiway data analysis.