Poisson Canonical Polyadic Tensor Model
- The PCP tensor model is a statistical framework that represents count data as independent Poisson random variables structured via nonnegative CP decompositions.
- It utilizes EM and majorization-minimization algorithms, along with hybrid stochastic–deterministic methods, to efficiently estimate factor matrices.
- Practical strategies include sparse tensor storage and efficient computations, ensuring scalability and reliable parameter inference for high-dimensional data.
The Poisson Canonical Polyadic (PCP) tensor model is a statistical framework for the multilinear decomposition of count-valued tensors. It posits that the observed entries of a multiway array are independent Poisson random variables whose mean parameters admit a nonnegative Canonical Polyadic (CP) structure. This model has been foundational in extending nonnegative matrix factorization (NMF) and low-rank tensor approximations to sparse count data commonly encountered in genomics, signal processing, text analysis, chemometrics, and other domains. Theoretical and algorithmic advancements underpinning PCP encompass generative modeling, Expectation-Maximization (EM) interpretations, majorization-minimization algorithms, Fisher information analysis, and hybrid stochastic–deterministic optimization.
1. Generative Model and Likelihood Formulation
Let denote an observed -way count tensor. The PCP model assumes
with Poisson means given as
where , is the CP rank, and all factor matrices are constrained to be nonnegative. In Kruskal notation,
The log-likelihood (up to additive constants independent of the parameters) is
Maximum likelihood estimation (MLE) for the factor matrices thus amounts to minimizing the generalized Kullback-Leibler (KL) divergence between and
This formulation replaces the classical least-squares (Gaussian) objective, which is ill-suited for sparse count data where the Poisson assumption correctly models the discrete, nonnegative, and frequently zero-inflated nature of the data (Chi et al., 2011, Llosa-Vite et al., 7 Nov 2025, Myers et al., 2022).
2. Latent Variable Formulation and the EM Connection
A core insight is that the PCP model arises by marginalizing over an unobserved ()-way latent tensor defined as
with
The complete-data log-likelihood,
can be maximized in closed form with respect to each factor matrix if were observed (Llosa-Vite et al., 7 Nov 2025). In practice, algorithms proceed in the EM framework: the E-step computes
and the M-step maximizes with respect to each , yielding normalized updates. This correspondence clarifies that canonical multiplicative updates for Poisson NMF and PCP—including Lee & Seung for and CP-APR for arbitrary —are EM or Generalized EM algorithms (Llosa-Vite et al., 7 Nov 2025, Chi et al., 2011).
3. Algorithms: Alternating Poisson Regression and Hybrid Optimization
3.1 Majorization-Minimization (CP-APR)
The CP-APR algorithm (Chi et al., 2011) employs nonlinear block Gauss–Seidel iteration: for each mode , fix all other factor matrices and solve the (strictly convex) subproblem
where is the Khatri–Rao product of all other factors. The majorization-minimization (MM) update for is
with (elementwise division), entrywise multiplication , and normalization to enforce simplex constraints. These iterations decrease the KL objective monotonically, with convergence to KKT points guaranteed under full row-rank on the support (Chi et al., 2011).
3.2 Hybrid Stochastic–Deterministic Methods
Hybrid GCP–CPAPR (HybridGC) alternates between stochastic GCP-Adam updates and deterministic CPAPR steps (Myers et al., 2022). The stochastic phase ("heating") uses minibatch gradients and adaptive stepsizes to escape poor local minima, while the deterministic phase ("cooling") accelerates convergence to the MLE. Restarted CPAPR with SVDrop periodically computes singular values of the mode- unfoldings; a sudden drop signals potential rank-deficient local minima, triggering random restarts. Empirical results indicate that HybridGC increases the probability of converging to the global MLE, with SVDrop further eliminating degenerate solutions (Myers et al., 2022).
3.3 Algorithmic Safeguards
To prevent convergence to non-KKT boundary points due to irrecoverable zero entries, a "scooching" procedure increments small inadmissible zeros into the positive orthant: whenever and (Chi et al., 2011). This guarantees that updates remain interior, crucial for both theoretical convergence and practical stability.
4. Computational Strategies for Sparse Tensors
Efficiency for large, sparse count arrays is achieved by never forming the full dense model tensor or Khatri–Rao products explicitly. Instead:
- Store in coordinate format with nonzero indices and values.
- For each nonzero, compute its associated partial product vector across all modes except the current one.
- Accumulate required sums only for observed nonzeros.
- Per iteration work is , and working storage is (Chi et al., 2011). This approach enables scalable decomposition on very sparse high-dimensional data, as demonstrated in empirical studies on tensors with up to entries but only nonzeros (Myers et al., 2022).
5. Parameter Inference: Fisher Information and Identifiability
Recent developments leverage the latent-variable formulation to compute both observed and expected Fisher information matrices for the PCP model (Llosa-Vite et al., 7 Nov 2025). Explicitly, each block of the Fisher information reflects the parameter covariances across different modes and CP components, with the expected information given as
where involves generalized contracted products over auxiliary dimensions. It is shown that has rank at most , where encodes CP scale/rotation indeterminacies. Identifiability and well-posedness depend on the balance between data size () and effective parameter count (); if the model is under-determined, the MLE is non-unique (Llosa-Vite et al., 7 Nov 2025).
For the rank-1 PCP, the log-likelihood and Fisher information simplify substantially, allowing closed-form MLEs: where is the mode- unfolding. The variance and covariances of the parameter estimates are given by blocks of the inverse Fisher information matrix, facilitating exact parametric inference in the simplest case (Llosa-Vite et al., 7 Nov 2025).
6. Empirical Performance and Practical Applications
Empirical studies have established the efficiency and robustness of CP-APR and hybrid stochastic-deterministic schemes for large-scale, sparse count data (Myers et al., 2022). On synthetic and real-world tensors, HybridGC attains within relative loss error of the (empirical) MLE in 96.7% of trials, outperforming pure CP-APR and stochastic GCP-Adam alone. SVDrop restarts nearly eliminate convergence to degenerate solutions, raising success rates to over 99.95% in tested benchmarks. Performance metrics include the relative loss error, probability of near-MLE convergence, and the Factor Match Score (FMS) for basis recovery.
Practical implications include:
- Monotonic objective decrease and convergence guarantees under mild genericity conditions.
- Algorithmic scalability with space and time linear in the number of nonzeros.
- Model selection guidance from Fisher information, flagging under-determined or non-identifiable settings.
7. Theoretical Significance and Ongoing Research Directions
The PCP model provides a rigorous statistical foundation for multiway low-rank decompositions of count data, generalizing NMF principles to higher dimensions and aligning model assumptions with the intrinsic sampling noise of discrete data. Current work explores:
- Non-iterative MLEs via latent-variable closure in special cases.
- Precise characterizations of identifiability and inherent ambiguities through Fisher information and its rank.
- Algorithmic extensions incorporating regularization, missing data, and streaming updates.
- Improved heuristics for rank selection based on statistical information content.
A plausible implication is that the intersection of latent-variable methods, MM/EM algorithms, and Fisher information theory offers a roadmap for provably optimal estimation and uncertainty quantification in high-dimensional count tensor decompositions; this is especially pertinent as data complexity and sparsity increase (Llosa-Vite et al., 7 Nov 2025, Chi et al., 2011, Myers et al., 2022).