Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
GPT-4o
Gemini 2.5 Pro Pro
o3 Pro
GPT-4.1 Pro
DeepSeek R1 via Azure Pro
2000 character limit reached

CP Rank Selection via Sparsity-Inducing Priors

Updated 2 August 2025
  • The paper presents a sparsity-inducing framework using submodular functions and convex surrogates to accurately recover the CP tensor rank.
  • It details hierarchical Bayesian techniques and ARD priors that enable group-level shrinkage and automatic relevance determination in multiway data.
  • The approach balances sparse model order selection with robust estimation through cross-validation, nonconvex regularizers, and scalable variational inference.

A sparsity-inducing prior for CP (CANDECOMP/PARAFAC) rank selection is a probabilistic or regularization framework designed to identify a minimal number of nonzero CP tensor components, thus effectively determining the tensor's rank by inducing sparsity in the set of candidate factors. The construction and implementation of such priors connect convex and nonconvex optimization, hierarchical Bayes, submodular analysis, and modern information criteria, and enable automatic or adaptive model order determination in high-dimensional multiway data.

1. Submodular Functions, Convex Envelopes, and Structured Norms

The selection of a small number of active CP tensor factors—a proxy for CP rank—can be approached as a combinatorial selection problem. Instead of directly minimizing the number of nonzero components, structured sparsity-inducing penalties use a set function F:2VR+F: 2^V \to \mathbb{R}_+, where VV indexes candidate factors, to encode both sparsity and prior structural constraints. FF is chosen to be nondecreasing and submodular (i.e., F(A)+F(B)F(AB)+F(AB)F(A) + F(B) \geq F(A \cup B) + F(A \cap B) for all A,BVA, B \subseteq V), which generalizes the cardinality function.

The relaxation of the combinatorial penalty F(supp(w))F(\operatorname{supp}(w)) is achieved through its convex envelope on the \ell_\infty ball, constructed as the Lovász extension f(w)f(w):

f(w)=k=1pwjk[F({j1,,jk})F({j1,,jk1})]f(w) = \sum_{k=1}^{p} w_{j_k} \left[ F(\{j_1, \ldots, j_k\}) - F(\{j_1, \ldots, j_{k-1}\}) \right]

for wR+pw \in \mathbb{R}_+^p ordered as wj1wj2wjp0w_{j_1} \geq w_{j_2} \geq \cdots \geq w_{j_p} \geq 0 (1008.4220). The resulting polyhedral norm Ω(w)=f(w)\Omega(w) = f(|w|) provides a convex surrogate tailored to the desired interaction between sparsity and structure.

In the context of CP rank selection, FF can specifically penalize the inclusion of additional rank-one components (e.g., F(A)=AF(A) = |A| yields the L1 norm), or encode block/groups, hierarchies, or couplings among factors to favor more structured low-rank solutions (1008.4220). The theoretical advantage is that the support of the minimizer under such penalties is a stable set for FF, with support recovery consistency under general conditions.

2. Hierarchical Bayesian and Group Priors

Hierarchical Bayesian frameworks provide sparse priors through Gaussian scale mixtures, where each parameter (e.g., a vector of CP factor weights or loadings) βj\beta_j receives a prior:

  • βjτj2N(0,τj2)\beta_j \mid \tau_j^2 \sim \mathcal{N}(0, \tau_j^2),
  • τj2Exp(λ2/2)\tau_j^2 \sim \text{Exp}(\lambda^2/2) (Exponential or Inverse-Gamma) (1009.1914).

Marginalizing over the scale parameter yields Laplace or generalized t-distributions. Placing an additional hierarchy—such as an inverse-gamma prior at the next layer—produces heavy-tailed, nonconvex sparsity-inducing priors that can adapt to both large signals and noise, introducing less bias for significant CP components (1009.1914). For grouped variables (e.g., all weights for a given CP component), group-level scales allow entire components to be “turned on/off,” inducing block sparsity and facilitating group-level CP rank selection.

The maximum a posteriori (MAP) estimate under such priors is computed by iterative reweighting or expectation-maximization (EM), with adaptive penalties calibrated by current parameter estimates. This approach is computationally efficient and naturally extends to the group sparse CP decomposition setting (1009.1914).

3. Automatic Relevance Determination and Fully Bayesian CP Rank Estimation

Automatic relevance determination (ARD) priors penalize columns of factor matrices across all CP modes by assigning each component a common latent precision λr\lambda_r, with hierarchical Gamma hyperpriors:

  • For each mode nn,

p(A(n)λ)=in=1InN(ain(n)0,diag(λ11,,λR1))p(A^{(n)} \mid \lambda) = \prod_{i_n=1}^{I_n} \mathcal{N}(a_{i_n}^{(n)} \mid 0, \mathrm{diag}(\lambda_1^{-1}, \ldots, \lambda_R^{-1}))

  • p(λr)=Gamma(c0,d0)p(\lambda_r) = \operatorname{Gamma}(c_0, d_0) (Zhao et al., 2014).

As λr\lambda_r \to \infty, the rrth component is shrunk to zero jointly across all modes. This yields a fully Bayesian CP decomposition with automatic rank selection: initialize with RR large, and the posterior inference prunes superfluous columns. Deterministic variational Bayesian (VB) algorithms provide closed-form parameter updates for all posteriors, scaling linearly with the number of observed entries. Predictive Student-t posteriors are naturally produced for imputation and uncertainty quantification in missing data scenarios (Zhao et al., 2014).

Rigid choices of sparsity priors (e.g., Gaussian-gamma or Laplace) may fail for high-rank or low-SNR tensors. The generalized hyperbolic (GH) prior introduces additional flexibility, including parameters that tune both the sharpness at the origin and tail behavior. The prior for each group (column across modes for a CP component) is written as

GH({U(:,l)(n)}n=1Nal0,bl0,λl0)\text{GH}(\{U^{(n)}_{(:,l)}\}_{n=1}^N \mid a_l^0, b_l^0, \lambda_l^0)

with the Gaussian scale mixture representation:

p({U(:,l)(n)})=N(vec({U(:,l)(n)})0,zlI)GIG(zlal0,bl0,λl0)dzlp(\{U^{(n)}_{(:,l)}\}) = \int \mathcal{N}(\mathrm{vec}(\{U^{(n)}_{(:,l)}\}) \mid 0, z_l I) \, \operatorname{GIG}(z_l \mid a_l^0, b_l^0, \lambda_l^0) dz_l

This allows consistent and robust automatic pruning of components even in highly challenging settings, outperforming Gaussian-gamma priors in recovering both low and high tensor ranks at varying SNRs (Cheng et al., 2020). Variational Bayes inference yields closed-form updates for all parameters and latent variables, ensuring scalability.

5. Nonconvex, Polyhedral, and Adaptive Regularizers

Beyond convex relaxations, nonconvex sparsity-inducing regularizers such as group penalties R(B)=iξiρ(bi2)R(B) = \sum_i \xi_i \rho(\|b_i\|_2) with choices like the Geman function ρGM(x)=x/(θ+x)\rho_{\text{GM}}(|x|) = |x|/(\theta + |x|) further sharpen support and remove estimation bias for large coefficients (Zhao et al., 2018). These penalties more decisively eliminate unnecessary CP components when an over-complete parameterization is provided, and can be solved efficiently with alternating minimization and majorization–minimization algorithms.

Recent work also develops frameworks to systematically generate sparsity-inducing regularizers with closed-form proximity or thresholding operators, enabling scalable optimization in both matrix and tensor (low-tubal-rank) completion problems. When applied to the singular values (or tubes), such regularizers act as nonconvex but computationally efficient rank surrogates, outperforming convex nuclear norm-based surrogates in many settings (Wang et al., 2023, Wang et al., 2023).

6. Calibration, Cross-Validation, and Information-Theoretic Model Selection

Model selection criteria addressing both sparsity and rank minimization must correctly adjust for the data-driven selection effect. Fixing regularization parameters (e.g., λ\lambda in Lasso-type penalties) across CV folds can yield inconsistent selection of sparsity patterns and ranks (She et al., 2018). This motivates cross-validation on the structural selection-projection pattern (e.g., active and projected CP factors) rather than on penalty magnitude, with minimax-optimal and scale-free information criteria calibrated to match the theoretical error bound:

Riskσ2{[min(q,J)+mr]r+Jlog(ep/J)}\text{Risk} \asymp \sigma^2 \left\{ [\min(q, J) + m - r] \cdot r + J \log (ep/J) \right\}

where JJ is active support size and rr is the (CP) rank. This framework ensures principled and reproducible rank and sparsity selection, bypassing the need for separate noise estimation (She et al., 2018).

7. Trade-offs, Limitations, and Practical Recommendations

Practical deployment of sparsity-inducing priors for CP rank selection must address the trade-off between sparsity (support size, parsimony) and the risk of discarding significant components. Iterative or cutting-plane strategies that incrementally enforce rank constraints or refine penalties enable exploration of the Pareto front between sparsity and model complexity (Fampa et al., 2020). While convex (polyhedral) penalties afford strong theoretical guarantees and scalable algorithms, nonconvex approaches can further enhance estimation accuracy but pose challenges with local minima and initialization sensitivity.

The choice of prior or penalty should be informed by empirical testing: moment-based or kurtosis-based tests can diagnose deviation from Laplace (L1) assumptions and prompt adaptive switching to q\ell_q or other generalized power priors (Griffin et al., 2017).

Summary Table: Methodological Approaches

Technique Principal Feature CP Rank Selection Mechanism
Submodular/Lovász Extensions Structured convex surrogate via set function F Polyhedral norm sparsity, support recovery
Hierarchical Bayesian (HAL, ARD) Gaussian scale mixtures; group & adaptive penalties Group-wise shrinkage, MAP, Bayesian pruning
Generalized Hyperbolic (GH) Flexible, heavy-tailed prior via Gaussian mixtures Robust ARD; improved high-rank/low-SNR rec.
Nonconvex Regularizers Bias-reduced, sharp thresholding (e.g., Geman, closed-form prox) Aggressive component elimination, efficiency
Cross-validation/Information Calibrated, scale-free, selection-pattern-based CV Ranking by structural error, minimax optimal

The construction and calibration of sparsity-inducing priors for CP rank selection synthesizes convex geometry, submodular analysis, Bayesian inference, and algorithmic optimization. These techniques collectively enable both accurate rank estimation and robust, interpretable CP decompositions in practical multiway data analysis.